1. DldProcessor class (DldProcessor)

class processor.DldProcessor.DldProcessor(settings=None, silent=False)[source]

This class simplifies the analysis of data files recorded during the beamtime of August 2017. It converts the electrons from the DLD into a clean table and uses DASK for binning and further analysis.

Attributes (loaded from SETTINGS.ini)
N_CORESint

The number of available CPU cores to use.

CHUNK_SIZEint

Size of the chunks in which a parquet file will be divided.

TOF_STEP_NSfloat

The step size in ns of the dldTime. Used to convert the step number to the ToF time in the delay line detector.

TOF_STEP_EVfloat

The step size in eV of the dldTime. Used to convert the step number to energy of the photoemitted electrons.

DATA_RAW_DIRstr

Path to raw data hdf5 files output by FLASH

DATA_PARQUET_DIRstr

Path to where parquet files are stored.

DATA_H5_DIRstr

Path to where hdf5 files containing binned data are stored.

DATA_RESULTS_DIRstr

Path to default saving location for results in np.array, tiff stack formats etc.

addBinning(name, start, end=None, steps=None, useStepSize=True, forceEnds=False, include_last=True, force_legacy=False, compute_histograms=False)[source]

Add binning of one dimension, to be then computed with computeBinnedData method.

Creates a list of bin names, (binNameList) to identify the axis on which to bin the data. Output array dimensions order will be the same as in this list. The attribute binRangeList will contain the ranges of the binning used for the corresponding dimension.

Parameters

name: str

Name of the column to apply binning to. Possible column names are`:` posX, posY, dldTime, pumpProbeTime, dldDetector, etc.

start: float

Position of first bin

end: float

Position of last bin (not included!)

steps: float

The bin size, if useStepSize=True (default), this is the step size, while if useStepSize=False, then this is the number of bins. In Legacy mode (force_legacy=True, or processor._LEGACY_MODE=True)

useStepSize: bool | True

Tells Python how to interpret steps.

True

interpret steps as a step size.

False

interpret steps as the number of steps.

forceEnds: bool | False

Tells python to give priority to the end parameter rather than the step parameter (see genBins for more info)

include_last: bool | True

Closes the interval on the right when true. If using step size priority, will expand the interval to include the next value when true, will shrink the interval to contain all points within the bounds if false.

force_legacy: bool | False
True

use np.arange method to generate bins.

False

use np.linspace method to generate bins.

Return

axes: numpy array

axis of the binned dimesion. The points defined on this axis are the middle points of each bin.

Note

If the name is ‘pumpProbeTime’: sets self.delaystageHistogram for normalization.

See also

computeBinnedData Method to compute all bins created with this function.

addFilter(*args, **kwargs)[source]
addFilterElipse(*args, **kwargs)[source]
add_workflow_step(workflow_step, name=None, overwrite=False, *args, **kwargs)[source]

Adds one or more steps to the workflow.

Args:
workflow_step: dict | str

This is the main descriptor of the workflow step. If a dictionary is passed, it expects a workflow-parameter like dictionary: {‘method’:method,’args’:[],kwargs:{}} Otherwise, if a string is passed, it should be the name of the method of the workflow step.

name: str

name to describe the workflow step (key in the workflow_parameters dictionary)

overwrite: bool

optional setting to overwrite previous entrys with the same name.

args:

arguments to pass to the method

kwargs: keyword arguments passed to the method.

appendDataframeParquet(fileName, path=None)[source]

Append data to an existing dask Parquet dataframe.

This can be used to concatenate multiple DAQ runs in one dataframe. Data is taken from the dd and dd_microbunch dataframe attributes.

Parameter
fileNamestr

name (including path) of the folder containing the parquet files to append the new data.

property binnedArrayShape
property binnedArraySize
calibrateEnergy(*args, **kwargs)[source]
calibrateMomentum(*args, **kwargs)[source]
calibratePumpProbeTime(*args, **kwargs)[source]
compute(*args, **kwargs)[source]

alias of computeBinnedData

computeBinnedData(saveAs=None, return_xarray=None, force_64bit=False, skip_metadata=False, compute_histograms=False, fast_metadata=False, usePbar=True)[source]

Use the bin list to bin the data.

Parameters

saveAs: str | None

full file name where to save the result (forces return_xarray to be true).

return_xarray: bool

if true, returns and xarray with all available axis and metadata information attached, otherwise a numpy.array.

Returns

result: numpy.array or xarray.DataArray

A numpy array of float64 values. Number of bins defined will define the dimensions of such array.

Notes

postProcess method must be used before computing the binned data if binning along pumpProbeDelay or polar k-space coordinates.

computeBinnedDataMulti(saveName=None, savePath=None, rank=None, size=None)[source]

Use the bin list to bin the data. Cluster-compatible version (Maciej Dendzik)

Parameters

saveName: str | None

filename

savePath: str | None

file path

rank: int | None

Rank number of cluster computer

size: int | None

Size of partition

Return

result: numpy array

A numpy array of float64 values. The number of bins determines the dimensionality of the output array.

Notes

postProcess method must be used before computing the binned data if binning is applied along pumpProbeDelay or polar k-space coordinates.

correctBAM(*args, **kwargs)[source]
correctBackLash(*args, **kwargs)[source]
correctDldTime(*args, **kwargs)[source]
createPolarCoordinates(*args, **kwargs)[source]
property delayStageHistogram

Easy access to pump probe normalization array. Mostly for retrocompatibility

delayStageMovingDirection(*args, **kwargs)[source]
deleteBinners()[source]

[DEPRECATED] use resetBins() instead

genBins(start, end, steps, useStepSize=True, forceEnds=False, include_last=True, force_legacy=False)[source]

Creates bins for use by binning functions. Can also be used to generate x axes.

Binning is created using np.linspace (formerly was done with np.arange). The implementation allows to choose between setting a step size (useStepSize=True, default) or using a number of bins (useStepSize=False).

In general, it is not possible to satisfy all 3 parameters: start, end, steps. For this reason, you can choose to give priority to the step size or to the interval size. In case forceEnds=False, the steps parameter is given priority and the end parameter is redefined, so the interval can actually be larger than expected. In case forceEnds = true, the stepSize is not enforced, and the interval is divided by the closest step that divides it cleanly. This of course only has meaning when choosing steps that do not cleanly divide the interval.

Parameters

start: float

Position of first bin

end: float

Position of last bin (not included!)

steps: float

Define the bin size. If useStepSize=True (default), this is the step size, while if useStepSize=False, then this is the number of bins. In Legacy mode (force_legacy=True, or processor._LEGACY_MODE=True)

useStepSize: bool | True

Tells python to interpret steps as a step size if True, or as the number of steps if False

forceEnds: bool | False

Tells python to give priority to the end parameter rather than the step parameter (see above for more info)

include_last: bool | True

Closes the interval on the right when true. If using step size priority, will expand the interval to include the next value when true, will shrink the interval to contain all points within the bounds if false.

force_legacy: bool | False

If true, imposes old method for generating bins, based on np.arange instead of np.inspace.

initAttributes(import_all=False)[source]

Parse settings file and assign the variables.

Args:
import_all: bool | False

Option to import method selection.

True imports all entries in SETTINGS.ini except those from sections [DAQ channels]. These are handled in the DldFlashProcessor subclass. Prints a warning when an entry is not found as class attribute

False only imports those that match existing attribute names.

Warnings:
UserWarning:

when an entry is found in the SETTINGS.ini file, which is not present as a pre-defined class attribute, it warns the user to add id to the code.

loadSettings(settings_file_name, preserve_path=True)[source]

Load settings from an other saved setting file.

To save settings simply copy paste the SETTINGS.ini file to the utilities/settings folder, and rename it. Use this name in this method to then load its content to the SETTINGS.ini file.

Args:
settings_file_name: str

Name of the settings file to load. This file must be in the folder “hextof-processor/utilities/settings”.

preserve_path: bool | True

Disables overwriting local file saving paths. Defaults to True.

static load_binned(file_name, mode='r', ret_type='list')[source]

Load an HDF5 file saved with save_binned() method. wrapper for function utils.load_binned_h5()

Parameters

file_name: str

name of the file to load, including full path

mode: str | ‘r’

Read mode of h5 file (‘r’ = read).

ret_type: str | ‘list’,’dict’

output format for axes and histograms: ‘list’ generates a list of arrays, ordered as the corresponding dimensions in data. ‘dict’ generates a dictionary with the names of each axis.

Returns

data: numpy array

Multidimensional data read from h5 file.

axes: numpy array

The axes values associated with the read data.

hist: numpy array

Histogram values associated with the read data.

load_settings(settings_file_name, preserve_path=True)[source]

Retrocompatibility naming

makeNormHistogram(name, compute=False)[source]
make_GMD_histogram(axis_name, axis_values)[source]
property metadata
normalizeAxisMean(data_array, ax)[source]

Normalize to the mean of the given axis.

Args:
data_array: np.ndarray

array to be normalized

ax: int

axis along which to normalize

Returns:
norm_array: np.ndarray

normalized array

normalizeDelay(data_array, ax=None, preserve_mean=True)[source]

Normalizes the data array to the number of counts per delay stage step.

Parameter
data_arraynumpy array

data array containing binned data, as created by the computeBinnedData method.

axstr

name of the axis to normalize to, default is None, which uses as normalization axis “pumpProbeTime” if found, otherwise “delayStage”

preserve_meanbool | True

if True, divides the histogram by its mean, preserving the average value of the normalized array.

Raise

Throw a ValueError when no pump probe time delay axis is available.

Return
data_array_normalizednumpy array

normalized version of the input array.

normalizeGMD(data_array, axis_name, axis_values)[source]

create the normalization array for normalizing to the FEL intensity at the GMD :Parameter:

data_array: np.ndarray

data to be normalized

axis_name:

name of the axis along which to perform normalization

axis_values:

the bins of the axis_name provided.

Return
normalized_array: np.ndarray

normalized version of the input array

normalizePumpProbeTime(data_array, ax='pumpProbeTime', preserve_mean=False)[source]
Normalizes the data array to the number of counts per delay stage step.

[DEPRECATED] this is buggy… the new normalizeDelay function should be used

Parameter
data_arraynumpy array

data array containing binned data, as created by the computeBinnedData method.

axstr | ‘pumpProbeTime’

axis name

Raise

Throw a ValueError when no pump probe time delay axis is available.

Return
data_array_normalizednumpy array

normalized version of the input array.

postProcess(*args, **kwargs)[source]
printRunOverview()[source]

Print run information, used in readData and readDataParquet

property pumpProbeTimeHistogram

Easy access to pump probe normalization array. Mostly for retrocompatibility

readDataframes(fileName=None, path=None, format='parquet')[source]

Load data from a parquet or HDF5 dataframe.

Access the data as hdf5 file (this is the format used internally, NOT the FLASH HDF5 files obtained from the DAQ system!)

Parameters
fileNamestr | None

Shared namestring of data file.

pathstr | None (default to self.DATA_PARQUET_DIR or self.DATA_H5_DIR)

name of the filepath (down to the lowest-level folder)

formatstr | ‘parquet’

file format, ‘parquet’ (parquet file), ‘h5’ or ‘hdf5’ (hdf5 file).

readDataframesParquet(fileName=None)[source]

[DEPRECATED] Load data from a dask Parquet dataframe. Use readDataframesParquet instead.

Parameter

fileName: str | None

name (including path) of the folder containing parquet files where the data was saved.

read_h5(h5FilePath)[source]

[DEPRECATED] Read the h5 file at given path and return the contained data.

Parameters
h5FilePahtstr

Path to the h5 file to read.

Returns

resultnp.ndarray

array containing binned data

axesdict

dictionary with axes data labeled by axes name

histogramnp.array

array of time normalization data.

remove_workflow_step(name)[source]

Removes a step from the workflow.

Args:
name: str | iterable of str

the key(s) which identify the workflow step(s) to remove.

resetBins()[source]

Reset the bin list

root_folder = '/home/docs/checkouts/readthedocs.org/user_builds/hextof-processor/conda/latest/lib/python3.8/site-packages'
property sample
save2hdf5(binnedData, path=None, filename='default.hdf5', normalizedData=None, overwrite=False)[source]

[DEPRECATED] Store the binned data in a hdf5 file.

Parameters
binneddatapd.DataFrame

binned data with bins in dldTime, posX, and posY (and if to be normalized, binned in detectors).

filenamestr | ‘default.hdf5’

name of the file.

pathstr | None | None

path to the location where to save the hdf5 file. If None, uses the default value defined in SETTINGS.ini

normalizedDatabool | None

Normalized data for both detector, so it should be a 3D array (posX, posY, detectorID).

overwritebool | False
True

overwrites existing files with matching filename.

False

no overwriting of files

Example

Normalization given, for example take it from run 18440.

processor.readRun(18440)
processor.addBinning('posX', 500, 1000, 2)
processor.addBinning('posY', 500, 1000, 2)
processor.addBinning('dldDetectorId', -1, 2, 1)
norm = processor.computeBinnedData()
norm = np.nan_to_num(norm)
norm[norm<10]=1 # 10 or smaller seems to be outside of detector
norm[:,:,0][norm[:,:,0] >= 10]/=norm[:,:,0][norm[:,:,0] >= 10].mean()
norm[:,:,1][norm[:,:,1] >= 10]/=norm[:,:,1][norm[:,:,1] >= 10].mean()
norm[norm<0.05]=0.1

Raises:
    Exception Wrong dimension: if data from binnedData has dimensions different from 4
save_binned(binnedData, file_name, saveFrames=True, path=None, mode='w')[source]

Save a binned numpy array to h5 file. The file includes the axes (taken from the scheduled bins) and the delay stage histogram, if it exists.

Parameters

binnedData: numpy array

Binned multidimensional data.

file_name: str

Name of the saved file. The extension ‘.h5’ is automatically added.

path: str | None

File path.

mode: str | ‘w’

Write mode of h5 file (‘w’ = write).

setDefaultSettings(settings_file_name, preserve_path=True)[source]

Load settings from an other saved setting file.

To save settings simply copy paste the SETTINGS.ini file to the utilities/settings folder, and rename it. Use this name in this method to then load its content to the SETTINGS.ini file.

Args:
settings_file_name: str

Name of the settings file to load. This file must be in the folder “hextof-processor/utilities/settings”.

preserve_path: bool | True

Disables overwriting local file saving paths. Defaults to True.

property settings

Easy access to settings.ini file

Returns:

dictionary with the settings file structure

property shape
property size
property size_mb
update_metadata(compute_histograms=True, fast_mode=False)[source]

Creates a dictionary with the most relevant metadata.

Args

fast_mode: bool | False

if False skips the heavy computation steps which take a long time.

Returns

metadata: dict

dictionary with metadata information

# TODO: distribute metadata generation in the appropriate methods.

this can be done as with “sample” and “workflow parameters”. They have their own dictionary independent from metadata, and metadata only collects these in one dict to be used or stored.

workflow()[source]

decorator function to automatically save workflow paramters.

When a method is decorated with this decorator, the parameters passed when calling it are automatically added to the workflow_parameter dictionary. Warning:

only explicitly passed arguments get saved. Default values will not be recorded.

property workflow_parameters

Safe access to the workflow parameter dictionary

exception processor.DldProcessor.SettingsInitializationError[source]