1. DldProcessor class (DldProcessor)
- class processor.DldProcessor.DldProcessor(settings=None, silent=False)[source]
This class simplifies the analysis of data files recorded during the beamtime of August 2017. It converts the electrons from the DLD into a clean table and uses DASK for binning and further analysis.
- Attributes (loaded from SETTINGS.ini)
- N_CORESint
The number of available CPU cores to use.
- CHUNK_SIZEint
Size of the chunks in which a parquet file will be divided.
- TOF_STEP_NSfloat
The step size in ns of the dldTime. Used to convert the step number to the ToF time in the delay line detector.
- TOF_STEP_EVfloat
The step size in eV of the dldTime. Used to convert the step number to energy of the photoemitted electrons.
- DATA_RAW_DIRstr
Path to raw data hdf5 files output by FLASH
- DATA_PARQUET_DIRstr
Path to where parquet files are stored.
- DATA_H5_DIRstr
Path to where hdf5 files containing binned data are stored.
- DATA_RESULTS_DIRstr
Path to default saving location for results in np.array, tiff stack formats etc.
- addBinning(name, start, end=None, steps=None, useStepSize=True, forceEnds=False, include_last=True, force_legacy=False, compute_histograms=False)[source]
Add binning of one dimension, to be then computed with
computeBinnedData
method.Creates a list of bin names, (binNameList) to identify the axis on which to bin the data. Output array dimensions order will be the same as in this list. The attribute binRangeList will contain the ranges of the binning used for the corresponding dimension.
Parameters
- name: str
Name of the column to apply binning to. Possible column names are`:` posX, posY, dldTime, pumpProbeTime, dldDetector, etc.
- start: float
Position of first bin
- end: float
Position of last bin (not included!)
- steps: float
The bin size, if useStepSize=True (default), this is the step size, while if useStepSize=False, then this is the number of bins. In Legacy mode (force_legacy=True, or processor._LEGACY_MODE=True)
- useStepSize: bool | True
Tells Python how to interpret steps.
- True
interpret steps as a step size.
- False
interpret steps as the number of steps.
- forceEnds: bool | False
Tells python to give priority to the end parameter rather than the step parameter (see genBins for more info)
- include_last: bool | True
Closes the interval on the right when true. If using step size priority, will expand the interval to include the next value when true, will shrink the interval to contain all points within the bounds if false.
- force_legacy: bool | False
- True
use np.arange method to generate bins.
- False
use np.linspace method to generate bins.
Return
- axes: numpy array
axis of the binned dimesion. The points defined on this axis are the middle points of each bin.
Note
If the name is ‘pumpProbeTime’: sets self.delaystageHistogram for normalization.
See also
computeBinnedData
Method to compute all bins created with this function.
- add_workflow_step(workflow_step, name=None, overwrite=False, *args, **kwargs)[source]
Adds one or more steps to the workflow.
- Args:
- workflow_step: dict | str
This is the main descriptor of the workflow step. If a dictionary is passed, it expects a workflow-parameter like dictionary: {‘method’:method,’args’:[],kwargs:{}} Otherwise, if a string is passed, it should be the name of the method of the workflow step.
- name: str
name to describe the workflow step (key in the workflow_parameters dictionary)
- overwrite: bool
optional setting to overwrite previous entrys with the same name.
- args:
arguments to pass to the method
kwargs: keyword arguments passed to the method.
- appendDataframeParquet(fileName, path=None)[source]
Append data to an existing dask Parquet dataframe.
This can be used to concatenate multiple DAQ runs in one dataframe. Data is taken from the dd and dd_microbunch dataframe attributes.
- Parameter
- fileNamestr
name (including path) of the folder containing the parquet files to append the new data.
- property binnedArrayShape
- property binnedArraySize
- computeBinnedData(saveAs=None, return_xarray=None, force_64bit=False, skip_metadata=False, compute_histograms=False, fast_metadata=False, usePbar=True)[source]
Use the bin list to bin the data.
Parameters
- saveAs: str | None
full file name where to save the result (forces return_xarray to be true).
- return_xarray: bool
if true, returns and xarray with all available axis and metadata information attached, otherwise a numpy.array.
Returns
- result: numpy.array or xarray.DataArray
A numpy array of float64 values. Number of bins defined will define the dimensions of such array.
Notes
postProcess method must be used before computing the binned data if binning along pumpProbeDelay or polar k-space coordinates.
- computeBinnedDataMulti(saveName=None, savePath=None, rank=None, size=None)[source]
Use the bin list to bin the data. Cluster-compatible version (Maciej Dendzik)
Parameters
- saveName: str | None
filename
- savePath: str | None
file path
- rank: int | None
Rank number of cluster computer
- size: int | None
Size of partition
Return
- result: numpy array
A numpy array of float64 values. The number of bins determines the dimensionality of the output array.
Notes
postProcess method must be used before computing the binned data if binning is applied along pumpProbeDelay or polar k-space coordinates.
- property delayStageHistogram
Easy access to pump probe normalization array. Mostly for retrocompatibility
- genBins(start, end, steps, useStepSize=True, forceEnds=False, include_last=True, force_legacy=False)[source]
Creates bins for use by binning functions. Can also be used to generate x axes.
Binning is created using np.linspace (formerly was done with np.arange). The implementation allows to choose between setting a step size (useStepSize=True, default) or using a number of bins (useStepSize=False).
In general, it is not possible to satisfy all 3 parameters: start, end, steps. For this reason, you can choose to give priority to the step size or to the interval size. In case forceEnds=False, the steps parameter is given priority and the end parameter is redefined, so the interval can actually be larger than expected. In case forceEnds = true, the stepSize is not enforced, and the interval is divided by the closest step that divides it cleanly. This of course only has meaning when choosing steps that do not cleanly divide the interval.
Parameters
- start: float
Position of first bin
- end: float
Position of last bin (not included!)
- steps: float
Define the bin size. If useStepSize=True (default), this is the step size, while if useStepSize=False, then this is the number of bins. In Legacy mode (force_legacy=True, or processor._LEGACY_MODE=True)
- useStepSize: bool | True
Tells python to interpret steps as a step size if True, or as the number of steps if False
- forceEnds: bool | False
Tells python to give priority to the end parameter rather than the step parameter (see above for more info)
- include_last: bool | True
Closes the interval on the right when true. If using step size priority, will expand the interval to include the next value when true, will shrink the interval to contain all points within the bounds if false.
- force_legacy: bool | False
If true, imposes old method for generating bins, based on np.arange instead of np.inspace.
- initAttributes(import_all=False)[source]
Parse settings file and assign the variables.
- Args:
- import_all: bool | False
Option to import method selection.
True
imports all entries in SETTINGS.ini except those from sections [DAQ channels]. These are handled in the DldFlashProcessor subclass. Prints a warning when an entry is not found as class attributeFalse
only imports those that match existing attribute names.
- Warnings:
- UserWarning:
when an entry is found in the SETTINGS.ini file, which is not present as a pre-defined class attribute, it warns the user to add id to the code.
- loadSettings(settings_file_name, preserve_path=True)[source]
Load settings from an other saved setting file.
To save settings simply copy paste the SETTINGS.ini file to the utilities/settings folder, and rename it. Use this name in this method to then load its content to the SETTINGS.ini file.
- Args:
- settings_file_name: str
Name of the settings file to load. This file must be in the folder “hextof-processor/utilities/settings”.
- preserve_path: bool | True
Disables overwriting local file saving paths. Defaults to True.
- static load_binned(file_name, mode='r', ret_type='list')[source]
Load an HDF5 file saved with
save_binned()
method. wrapper for function utils.load_binned_h5()Parameters
- file_name: str
name of the file to load, including full path
- mode: str | ‘r’
Read mode of h5 file (‘r’ = read).
- ret_type: str | ‘list’,’dict’
output format for axes and histograms: ‘list’ generates a list of arrays, ordered as the corresponding dimensions in data. ‘dict’ generates a dictionary with the names of each axis.
Returns
- data: numpy array
Multidimensional data read from h5 file.
- axes: numpy array
The axes values associated with the read data.
- hist: numpy array
Histogram values associated with the read data.
- property metadata
- normalizeAxisMean(data_array, ax)[source]
Normalize to the mean of the given axis.
- Args:
- data_array: np.ndarray
array to be normalized
- ax: int
axis along which to normalize
- Returns:
- norm_array: np.ndarray
normalized array
- normalizeDelay(data_array, ax=None, preserve_mean=True)[source]
Normalizes the data array to the number of counts per delay stage step.
- Parameter
- data_arraynumpy array
data array containing binned data, as created by the
computeBinnedData
method.- axstr
name of the axis to normalize to, default is None, which uses as normalization axis “pumpProbeTime” if found, otherwise “delayStage”
- preserve_meanbool | True
if True, divides the histogram by its mean, preserving the average value of the normalized array.
- Raise
Throw a ValueError when no pump probe time delay axis is available.
- Return
- data_array_normalizednumpy array
normalized version of the input array.
- normalizeGMD(data_array, axis_name, axis_values)[source]
create the normalization array for normalizing to the FEL intensity at the GMD :Parameter:
- data_array: np.ndarray
data to be normalized
- axis_name:
name of the axis along which to perform normalization
- axis_values:
the bins of the axis_name provided.
- Return
- normalized_array: np.ndarray
normalized version of the input array
- normalizePumpProbeTime(data_array, ax='pumpProbeTime', preserve_mean=False)[source]
- Normalizes the data array to the number of counts per delay stage step.
[DEPRECATED] this is buggy… the new normalizeDelay function should be used
- Parameter
- data_arraynumpy array
data array containing binned data, as created by the
computeBinnedData
method.- axstr | ‘pumpProbeTime’
axis name
- Raise
Throw a ValueError when no pump probe time delay axis is available.
- Return
- data_array_normalizednumpy array
normalized version of the input array.
- property pumpProbeTimeHistogram
Easy access to pump probe normalization array. Mostly for retrocompatibility
- readDataframes(fileName=None, path=None, format='parquet')[source]
Load data from a parquet or HDF5 dataframe.
Access the data as hdf5 file (this is the format used internally, NOT the FLASH HDF5 files obtained from the DAQ system!)
- Parameters
- fileNamestr | None
Shared namestring of data file.
- pathstr | None (default to
self.DATA_PARQUET_DIR
orself.DATA_H5_DIR
) name of the filepath (down to the lowest-level folder)
- formatstr | ‘parquet’
file format, ‘parquet’ (parquet file), ‘h5’ or ‘hdf5’ (hdf5 file).
- readDataframesParquet(fileName=None)[source]
[DEPRECATED] Load data from a dask Parquet dataframe. Use readDataframesParquet instead.
Parameter
- fileName: str | None
name (including path) of the folder containing parquet files where the data was saved.
- read_h5(h5FilePath)[source]
[DEPRECATED] Read the h5 file at given path and return the contained data.
- Parameters
- h5FilePahtstr
Path to the h5 file to read.
- Returns
- resultnp.ndarray
array containing binned data
- axesdict
dictionary with axes data labeled by axes name
- histogramnp.array
array of time normalization data.
- remove_workflow_step(name)[source]
Removes a step from the workflow.
- Args:
- name: str | iterable of str
the key(s) which identify the workflow step(s) to remove.
- root_folder = '/home/docs/checkouts/readthedocs.org/user_builds/hextof-processor/conda/latest/lib/python3.8/site-packages'
- property sample
- save2hdf5(binnedData, path=None, filename='default.hdf5', normalizedData=None, overwrite=False)[source]
[DEPRECATED] Store the binned data in a hdf5 file.
- Parameters
- binneddatapd.DataFrame
binned data with bins in dldTime, posX, and posY (and if to be normalized, binned in detectors).
- filenamestr | ‘default.hdf5’
name of the file.
- pathstr | None | None
path to the location where to save the hdf5 file. If None, uses the default value defined in SETTINGS.ini
- normalizedDatabool | None
Normalized data for both detector, so it should be a 3D array (posX, posY, detectorID).
- overwritebool | False
- True
overwrites existing files with matching filename.
- False
no overwriting of files
- Example
Normalization given, for example take it from run 18440.
processor.readRun(18440) processor.addBinning('posX', 500, 1000, 2) processor.addBinning('posY', 500, 1000, 2) processor.addBinning('dldDetectorId', -1, 2, 1) norm = processor.computeBinnedData() norm = np.nan_to_num(norm) norm[norm<10]=1 # 10 or smaller seems to be outside of detector norm[:,:,0][norm[:,:,0] >= 10]/=norm[:,:,0][norm[:,:,0] >= 10].mean() norm[:,:,1][norm[:,:,1] >= 10]/=norm[:,:,1][norm[:,:,1] >= 10].mean() norm[norm<0.05]=0.1 Raises: Exception Wrong dimension: if data from binnedData has dimensions different from 4
- save_binned(binnedData, file_name, saveFrames=True, path=None, mode='w')[source]
Save a binned numpy array to h5 file. The file includes the axes (taken from the scheduled bins) and the delay stage histogram, if it exists.
Parameters
- binnedData: numpy array
Binned multidimensional data.
- file_name: str
Name of the saved file. The extension ‘.h5’ is automatically added.
- path: str | None
File path.
- mode: str | ‘w’
Write mode of h5 file (‘w’ = write).
- setDefaultSettings(settings_file_name, preserve_path=True)[source]
Load settings from an other saved setting file.
To save settings simply copy paste the SETTINGS.ini file to the utilities/settings folder, and rename it. Use this name in this method to then load its content to the SETTINGS.ini file.
- Args:
- settings_file_name: str
Name of the settings file to load. This file must be in the folder “hextof-processor/utilities/settings”.
- preserve_path: bool | True
Disables overwriting local file saving paths. Defaults to True.
- property settings
Easy access to settings.ini file
- Returns:
dictionary with the settings file structure
- property shape
- property size
- property size_mb
- update_metadata(compute_histograms=True, fast_mode=False)[source]
Creates a dictionary with the most relevant metadata.
Args
- fast_mode: bool | False
if False skips the heavy computation steps which take a long time.
Returns
- metadata: dict
dictionary with metadata information
- # TODO: distribute metadata generation in the appropriate methods.
this can be done as with “sample” and “workflow parameters”. They have their own dictionary independent from metadata, and metadata only collects these in one dict to be used or stored.
- workflow()[source]
decorator function to automatically save workflow paramters.
When a method is decorated with this decorator, the parameters passed when calling it are automatically added to the workflow_parameter dictionary. Warning:
only explicitly passed arguments get saved. Default values will not be recorded.
- property workflow_parameters
Safe access to the workflow parameter dictionary