2. Dataframe creator class (DldFlashDataframeCreatorExpress)

class processor.DldFlashDataframeCreatorExpress.DldFlashProcessorExpress(runNumber=None, channels=None, settings=None, beamtime_dir=None, parquet_path=None, parquet_dir=None, beamtime_id=None, year=None, daq='fl1user2', silent=False)[source]

The class generates multiindexed multidimensional pandas dataframes from the new FLASH dataformat resolved by both macro and microbunches alongside electrons.

property addChannels

Add new channels using a dict format defined by: “channel_name”: {

“format”: “per_pulse” | “per_train” | “per_electron”, “group_name”: “channel_group_path”, “slice”: “:”

}

property availableChannels: Returns the channel names that are available for use, excluding pulseId, defined by the json file

property channelsPerElectron: Returns a list of channels with per_electron format

property channelsPerPulse: Returns a list of channels with per_pulse format, including all auxillary channels

property channelsPerTrain: Returns a list of channels with per_train format

concatenateChannels(h5_file, format_=None)[source]: Returns a concatenated pandas DataFrame for either all pulse resolved or all electron resolved channels.

createDataframePerChannel(h5_file, channel)[source]: Returns a pandas DataFrame for a given channel name for a given file. The Dataframe contains the MultiIndex and returns depending on the channel’s format

createDataframePerFile(file_path)[source]: Returns two pandas DataFrames constructed for the given file. The DataFrames contains the datasets from the iterable in the order opposite to specified by channel names. One DataFrame is pulse resolved and the other electron resolved.

createMultiIndexPerElectron(h5_file)[source]: Creates an index per electron using pulseId for usage with the electron resolved pandas dataframe

createMultiIndexPerPulse(train_id, np_array)[source]: Creates an index per pulse using a pulse resovled channel’s macrobunch ID, for usage with the pulse resolved pandas dataframe

createNumpyArrayPerChannel(h5_file, channel)[source]: Returns a numpy Array for a given channel name for a given file

fillNA()[source]: Routine to fill the NaN values with intrafile forward filling.

h5_to_parquet(h5, prq)[source]

readData(runs=None, ignore_missing_runs=False, settings=None, channels=None, beamtime_dir=None, parquet_path=None, beamtime_id=None, year=None, daq='fl1user2')[source]

Read express data from DAQ, generating a parquet in between.

Args:

runs: int | list: run number or list of run numbers to load
ignore_missing_runs: bool: if False, rises FileNotFoundError in case files for a run are not available.
settings: str | path: pointer to the ini settings file, handeled by the dldProcessor class. It can be the name of a default settings file found in the settings dir of the repo, or the path to a specific settings file.
channels: list: list of channel names to include in the dataframe. if none defaults to all available channels
beamtime_dir: str | path: path to the raaw data. If none, its inferred from the settings file
parquet_path: str | path: path relative to beamtime_dir where to storethe parquet files. Defaults to “beamtime_dir/processed/parquet”
beamtime_id: int: the id of the beamtime. If none it is inferred from settings
year: int: the year of the beamtime. If none it is inferred from settings
daq: str: the daq containig the data. If none it is inferred from settings

returns:

prc: DldProcessor: returns an instance of the processor class, with electron and pulse dataframes loaded.

property removeChannels: Removes the unnecessary channels from the available channels using list of channels to remove

resetMultiIndex()[source]: Resets the index per pulse and electron

runFilesNames(run_number, daq, raw_data_dir)[source]: Returns all filenames of given run located in directory for the given daq.