1. DldProcessor class (DldProcessor)

class processor.DldProcessor.DldProcessor(settings=None, silent=False)[source]

This class simplifies the analysis of data files recorded during the beamtime of August 2017. It converts the electrons from the DLD into a clean table and uses DASK for binning and further analysis.

Attributes (loaded from SETTINGS.ini)

N_CORESint: The number of available CPU cores to use.
CHUNK_SIZEint: Size of the chunks in which a parquet file will be divided.
TOF_STEP_NSfloat: The step size in ns of the dldTime. Used to convert the step number to the ToF time in the delay line detector.
TOF_STEP_EVfloat: The step size in eV of the dldTime. Used to convert the step number to energy of the photoemitted electrons.
DATA_RAW_DIRstr: Path to raw data hdf5 files output by FLASH
DATA_PARQUET_DIRstr: Path to where parquet files are stored.
DATA_H5_DIRstr: Path to where hdf5 files containing binned data are stored.
DATA_RESULTS_DIRstr: Path to default saving location for results in np.array, tiff stack formats etc.

addBinning(name, start, end=None, steps=None, useStepSize=True, forceEnds=False, include_last=True, force_legacy=False, compute_histograms=False)[source]

Add binning of one dimension, to be then computed with computeBinnedData method.

Creates a list of bin names, (binNameList) to identify the axis on which to bin the data. Output array dimensions order will be the same as in this list. The attribute binRangeList will contain the ranges of the binning used for the corresponding dimension.

Parameters

name: str

Name of the column to apply binning to. Possible column names are`:` posX, posY, dldTime, pumpProbeTime, dldDetector, etc.

start: float

Position of first bin

end: float

Position of last bin (not included!)

steps: float

The bin size, if useStepSize=True (default), this is the step size, while if useStepSize=False, then this is the number of bins. In Legacy mode (force_legacy=True, or processor._LEGACY_MODE=True)

useStepSize: bool | True

Tells Python how to interpret steps.

True: interpret steps as a step size.
False: interpret steps as the number of steps.

forceEnds: bool | False

Tells python to give priority to the end parameter rather than the step parameter (see genBins for more info)

include_last: bool | True

Closes the interval on the right when true. If using step size priority, will expand the interval to include the next value when true, will shrink the interval to contain all points within the bounds if false.

force_legacy: bool | False

True: use np.arange method to generate bins.
False: use np.linspace method to generate bins.

Return

axes: numpy array: axis of the binned dimesion. The points defined on this axis are the middle points of each bin.

Note

If the name is ‘pumpProbeTime’: sets self.delaystageHistogram for normalization.

See also

computeBinnedData Method to compute all bins created with this function.

addFilter(*args, **kwargs)[source]

addFilterElipse(*args, **kwargs)[source]

add_workflow_step(workflow_step, name=None, overwrite=False, *args, **kwargs)[source]

Adds one or more steps to the workflow.

Args:

workflow_step: dict | str: This is the main descriptor of the workflow step. If a dictionary is passed, it expects a workflow-parameter like dictionary: {‘method’:method,’args’:[],kwargs:{}} Otherwise, if a string is passed, it should be the name of the method of the workflow step.
name: str: name to describe the workflow step (key in the workflow_parameters dictionary)
overwrite: bool: optional setting to overwrite previous entrys with the same name.
args:: arguments to pass to the method

kwargs: keyword arguments passed to the method.

appendDataframeParquet(fileName, path=None)[source]

Append data to an existing dask Parquet dataframe.

This can be used to concatenate multiple DAQ runs in one dataframe. Data is taken from the dd and dd_microbunch dataframe attributes.

Parameter

fileNamestr: name (including path) of the folder containing the parquet files to append the new data.

property binnedArrayShape

property binnedArraySize

calibrateEnergy(*args, **kwargs)[source]

calibrateMomentum(*args, **kwargs)[source]

calibratePumpProbeTime(*args, **kwargs)[source]

compute(*args, **kwargs)[source]: alias of computeBinnedData

computeBinnedData(saveAs=None, return_xarray=None, force_64bit=False, skip_metadata=False, compute_histograms=False, fast_metadata=False, usePbar=True)[source]

Use the bin list to bin the data.

Parameters

saveAs: str | None: full file name where to save the result (forces return_xarray to be true).
return_xarray: bool: if true, returns and xarray with all available axis and metadata information attached, otherwise a numpy.array.

Returns

result: numpy.array or xarray.DataArray: A numpy array of float64 values. Number of bins defined will define the dimensions of such array.

Notes

postProcess method must be used before computing the binned data if binning along pumpProbeDelay or polar k-space coordinates.

computeBinnedDataMulti(saveName=None, savePath=None, rank=None, size=None)[source]

Use the bin list to bin the data. Cluster-compatible version (Maciej Dendzik)

Parameters

saveName: str | None: filename
savePath: str | None: file path
rank: int | None: Rank number of cluster computer
size: int | None: Size of partition

Return

result: numpy array: A numpy array of float64 values. The number of bins determines the dimensionality of the output array.

Notes

postProcess method must be used before computing the binned data if binning is applied along pumpProbeDelay or polar k-space coordinates.

correctBAM(*args, **kwargs)[source]

correctBackLash(*args, **kwargs)[source]

correctDldTime(*args, **kwargs)[source]

createPolarCoordinates(*args, **kwargs)[source]

property delayStageHistogram: Easy access to pump probe normalization array. Mostly for retrocompatibility

delayStageMovingDirection(*args, **kwargs)[source]

deleteBinners()[source]: [DEPRECATED] use resetBins() instead

genBins(start, end, steps, useStepSize=True, forceEnds=False, include_last=True, force_legacy=False)[source]

Creates bins for use by binning functions. Can also be used to generate x axes.

Binning is created using np.linspace (formerly was done with np.arange). The implementation allows to choose between setting a step size (useStepSize=True, default) or using a number of bins (useStepSize=False).

In general, it is not possible to satisfy all 3 parameters: start, end, steps. For this reason, you can choose to give priority to the step size or to the interval size. In case forceEnds=False, the steps parameter is given priority and the end parameter is redefined, so the interval can actually be larger than expected. In case forceEnds = true, the stepSize is not enforced, and the interval is divided by the closest step that divides it cleanly. This of course only has meaning when choosing steps that do not cleanly divide the interval.

Parameters

start: float: Position of first bin
end: float: Position of last bin (not included!)
steps: float: Define the bin size. If useStepSize=True (default), this is the step size, while if useStepSize=False, then this is the number of bins. In Legacy mode (force_legacy=True, or processor._LEGACY_MODE=True)
useStepSize: bool | True: Tells python to interpret steps as a step size if True, or as the number of steps if False
forceEnds: bool | False: Tells python to give priority to the end parameter rather than the step parameter (see above for more info)
include_last: bool | True: Closes the interval on the right when true. If using step size priority, will expand the interval to include the next value when true, will shrink the interval to contain all points within the bounds if false.
force_legacy: bool | False: If true, imposes old method for generating bins, based on np.arange instead of np.inspace.

initAttributes(import_all=False)[source]

Parse settings file and assign the variables.

Args:

import_all: bool | False

Option to import method selection.

True imports all entries in SETTINGS.ini except those from sections [DAQ channels]. These are handled in the DldFlashProcessor subclass. Prints a warning when an entry is not found as class attribute

False only imports those that match existing attribute names.

Warnings:

UserWarning:: when an entry is found in the SETTINGS.ini file, which is not present as a pre-defined class attribute, it warns the user to add id to the code.

loadSettings(settings_file_name, preserve_path=True)[source]

Load settings from an other saved setting file.

To save settings simply copy paste the SETTINGS.ini file to the utilities/settings folder, and rename it. Use this name in this method to then load its content to the SETTINGS.ini file.

Args:

settings_file_name: str: Name of the settings file to load. This file must be in the folder “hextof-processor/utilities/settings”.
preserve_path: bool | True: Disables overwriting local file saving paths. Defaults to True.

static load_binned(file_name, mode='r', ret_type='list')[source]

Load an HDF5 file saved with save_binned() method. wrapper for function utils.load_binned_h5()

Parameters

file_name: str: name of the file to load, including full path
mode: str | ‘r’: Read mode of h5 file (‘r’ = read).
ret_type: str | ‘list’,’dict’: output format for axes and histograms: ‘list’ generates a list of arrays, ordered as the corresponding dimensions in data. ‘dict’ generates a dictionary with the names of each axis.

Returns

data: numpy array: Multidimensional data read from h5 file.
axes: numpy array: The axes values associated with the read data.
hist: numpy array: Histogram values associated with the read data.

load_settings(settings_file_name, preserve_path=True)[source]: Retrocompatibility naming

makeNormHistogram(name, compute=False)[source]

make_GMD_histogram(axis_name, axis_values)[source]

property metadata

normalizeAxisMean(data_array, ax)[source]

Normalize to the mean of the given axis.

Args:

data_array: np.ndarray: array to be normalized
ax: int: axis along which to normalize

Returns:

norm_array: np.ndarray: normalized array

normalizeDelay(data_array, ax=None, preserve_mean=True)[source]

Normalizes the data array to the number of counts per delay stage step.

Parameter

data_arraynumpy array: data array containing binned data, as created by the computeBinnedData method.
axstr: name of the axis to normalize to, default is None, which uses as normalization axis “pumpProbeTime” if found, otherwise “delayStage”
preserve_meanbool | True: if True, divides the histogram by its mean, preserving the average value of the normalized array.

Raise

Throw a ValueError when no pump probe time delay axis is available.

Return

data_array_normalizednumpy array: normalized version of the input array.

normalizeGMD(data_array, axis_name, axis_values)[source]

create the normalization array for normalizing to the FEL intensity at the GMD :Parameter:

data_array: np.ndarray
data to be normalized

axis_name:
name of the axis along which to perform normalization

axis_values:
the bins of the axis_name provided.

Return

normalized_array: np.ndarray: normalized version of the input array

normalizePumpProbeTime(data_array, ax='pumpProbeTime', preserve_mean=False)[source]

Normalizes the data array to the number of counts per delay stage step.: [DEPRECATED] this is buggy… the new normalizeDelay function should be used

Parameter

data_arraynumpy array: data array containing binned data, as created by the computeBinnedData method.
axstr | ‘pumpProbeTime’: axis name

Raise

Throw a ValueError when no pump probe time delay axis is available.

Return

data_array_normalizednumpy array: normalized version of the input array.

postProcess(*args, **kwargs)[source]

printRunOverview()[source]: Print run information, used in readData and readDataParquet

property pumpProbeTimeHistogram: Easy access to pump probe normalization array. Mostly for retrocompatibility

readDataframes(fileName=None, path=None, format='parquet')[source]

Load data from a parquet or HDF5 dataframe.

Access the data as hdf5 file (this is the format used internally, NOT the FLASH HDF5 files obtained from the DAQ system!)

Parameters

fileNamestr | None: Shared namestring of data file.
pathstr | None (default to self.DATA_PARQUET_DIR or self.DATA_H5_DIR): name of the filepath (down to the lowest-level folder)
formatstr | ‘parquet’: file format, ‘parquet’ (parquet file), ‘h5’ or ‘hdf5’ (hdf5 file).

readDataframesParquet(fileName=None)[source]

[DEPRECATED] Load data from a dask Parquet dataframe. Use readDataframesParquet instead.

Parameter

fileName: str | None: name (including path) of the folder containing parquet files where the data was saved.

read_h5(h5FilePath)[source]

[DEPRECATED] Read the h5 file at given path and return the contained data.

Parameters

h5FilePahtstr: Path to the h5 file to read.

Returns

resultnp.ndarray: array containing binned data
axesdict: dictionary with axes data labeled by axes name
histogramnp.array: array of time normalization data.

remove_workflow_step(name)[source]

Removes a step from the workflow.

Args:

name: str | iterable of str: the key(s) which identify the workflow step(s) to remove.

resetBins()[source]: Reset the bin list

root_folder = '/home/docs/checkouts/readthedocs.org/user_builds/hextof-processor/conda/latest/lib/python3.8/site-packages'

property sample

save2hdf5(binnedData, path=None, filename='default.hdf5', normalizedData=None, overwrite=False)[source]

[DEPRECATED] Store the binned data in a hdf5 file.

Parameters

binneddatapd.DataFrame

binned data with bins in dldTime, posX, and posY (and if to be normalized, binned in detectors).

filenamestr | ‘default.hdf5’

name of the file.

pathstr | None | None

path to the location where to save the hdf5 file. If None, uses the default value defined in SETTINGS.ini

normalizedDatabool | None

Normalized data for both detector, so it should be a 3D array (posX, posY, detectorID).

overwritebool | False

True: overwrites existing files with matching filename.
False: no overwriting of files

Example

Normalization given, for example take it from run 18440.

processor.readRun(18440)
processor.addBinning('posX', 500, 1000, 2)
processor.addBinning('posY', 500, 1000, 2)
processor.addBinning('dldDetectorId', -1, 2, 1)
norm = processor.computeBinnedData()
norm = np.nan_to_num(norm)
norm[norm<10]=1 # 10 or smaller seems to be outside of detector
norm[:,:,0][norm[:,:,0] >= 10]/=norm[:,:,0][norm[:,:,0] >= 10].mean()
norm[:,:,1][norm[:,:,1] >= 10]/=norm[:,:,1][norm[:,:,1] >= 10].mean()
norm[norm<0.05]=0.1

Raises:
    Exception Wrong dimension: if data from binnedData has dimensions different from 4

save_binned(binnedData, file_name, saveFrames=True, path=None, mode='w')[source]

Save a binned numpy array to h5 file. The file includes the axes (taken from the scheduled bins) and the delay stage histogram, if it exists.

Parameters

binnedData: numpy array: Binned multidimensional data.
file_name: str: Name of the saved file. The extension ‘.h5’ is automatically added.
path: str | None: File path.
mode: str | ‘w’: Write mode of h5 file (‘w’ = write).

setDefaultSettings(settings_file_name, preserve_path=True)[source]

Load settings from an other saved setting file.

To save settings simply copy paste the SETTINGS.ini file to the utilities/settings folder, and rename it. Use this name in this method to then load its content to the SETTINGS.ini file.

Args:

settings_file_name: str: Name of the settings file to load. This file must be in the folder “hextof-processor/utilities/settings”.
preserve_path: bool | True: Disables overwriting local file saving paths. Defaults to True.

property settings

Easy access to settings.ini file

Returns:: dictionary with the settings file structure

property shape

property size

property size_mb

update_metadata(compute_histograms=True, fast_mode=False)[source]

Creates a dictionary with the most relevant metadata.

Args

fast_mode: bool | False: if False skips the heavy computation steps which take a long time.

Returns

metadata: dict: dictionary with metadata information
# TODO: distribute metadata generation in the appropriate methods.: this can be done as with “sample” and “workflow parameters”. They have their own dictionary independent from metadata, and metadata only collects these in one dict to be used or stored.

workflow()[source]

decorator function to automatically save workflow paramters.

When a method is decorated with this decorator, the parameters passed when calling it are automatically added to the workflow_parameter dictionary. Warning:

only explicitly passed arguments get saved. Default values will not be recorded.

property workflow_parameters: Safe access to the workflow parameter dictionary

exception processor.DldProcessor.SettingsInitializationError[source]