data_tools – Data Processing Module¶
Manipulate the baobab data and prepare it for the model.
This module contains functions that will normalize and reparametrize the data. It also contains the functions neccesary to build a TFDataset that can be used for efficient parallelization in training.
See the script model_trainer.py for examples of how to use these functions.
-
ovejero.data_tools.build_tf_dataset(tf_record_path, lens_params, batch_size, n_epochs, baobab_config_path=None, norm_images=False, shift_pixels=0, shift_params=None, normed_pixel_scale={})[source]¶ Return a TFDataset for use in training the model.
Parameters: - tf_record_path (str) – The path to the TFRecord file that will be turned into a TFDataset
- lens_params ([str,..]) – A list of strings containing the lens params that were written out as features
- batch_size (int) – The batch size that will be used for training
- n_epochs (int) – The number of training epochs. The dataset object will deal with iterating over the data for repeated epochs.
- baobab_config_path – The string specifying the path to the baobab config for the dataset. If None, no noise will be added.
- norm_images (bool) – If True, images will be normalized to have std 1.
- shift_pixels (int) – If >0, images will be shifted uniformly between 0 and shift_pixels pixels in the x and y direction (the shift in the x and y direction are drawn separately).
- shift_params (([str,..],[str,..])) – A tuple of lists of the parameters that must be shifted. The first list contains the x parameters and the second the y. Must be set if shift_pixels is used.
- normed_pixel_scale (dict) – A dict mapping from parameter to the pixel scale (in arcseconds of pixels) for that parameter. Only needs to be set if shift_pixels is being used. If the data was normalized, the pixel scale must also be normalized.
Returns: A TFDataset object for use in training
Return type: (tf.TFDataset)
-
ovejero.data_tools.gampsi_2_g1g2(lens_param_rat, lens_param_ang, lens_params_path, new_lens_params_path, new_lens_parameter_prefix)[source]¶ Convert one lens parameter pair of gamma and psi to cartesian coordinates.
Parameters: - lens_param_rat (str) – The gamma parameter name
- lens_param_ang (str) – The angle parameter name
- lens_params_path (str) – The path to the csv file containing the lens parameters
- new_lens_params_path (str) – The path to the csv file where the old parameters and the new excentricities will be written
- new_lens_parameter_prefix (str) – The prefix for the new lens parameter name (for example external_shear)
Notes
New values of parameters will be written to csv file with the names ‘lens new_lens_parameter_prefix name’_e1/e2
-
ovejero.data_tools.generate_tf_record(root_path, lens_params, lens_params_path, tf_record_path)[source]¶ Generate a TFRecord file from a directory of numpy files.
Parameters: - root_path (str) – The path to the folder containing all of the numpy files
- lens_params (str) – A list of strings containing the lens params that should be written out as features
- lens_params_path (str) – The path to the csv file containing the lens parameters
- tf_record_path (str) – The path to which the tf_record will be saved
-
ovejero.data_tools.normalize_lens_parameters(lens_params, lens_params_path, normalized_param_path, normalization_constants_path, train_or_test='train')[source]¶ Normalize the lens parameters such that they have mean 0 and standard deviation 1.
Parameters: - lens_params ([str,...]) – A list of strings containing the lens params that should be written out as features
- lens_params_path (str) – The path to the csv file containing the lens parameters
- normalized_param_path (str) – The path to the csv file where the normalized parameters will be written
- normalization_constants_path (str) – The path to the csv file where the mean and std used for normalization will be written / read
- train_or_test (str) – Whether this is a train time or test time operation. At test time the normalization values will be read from the normalization constants file instead of written to it.
-
ovejero.data_tools.write_parameters_in_log_space(lens_params, lens_params_path, new_lens_params_path)[source]¶ Convert lens parameters to log space (important for parameters that cannot be negative)
Parameters: - lens_params ([str,..]) – The parameters that will be convereted to log space
- lens_params_path (str) – The path to the csv file containing the lens parameters
- new_lens_params_path (str) – The path to the csv file where the old parameters and the log parameter will be written. Can be the same as lens_params_path
Notes
New values of parameters will be written to csv file with the name ‘lens parameter name’_log