2. Dataframe creator class (DldFlashDataframeCreatorExpress)

class processor.DldFlashDataframeCreatorExpress.DldFlashProcessorExpress(runNumber=None, channels=None, settings=None, beamtime_dir=None, parquet_path=None, parquet_dir=None, beamtime_id=None, year=None, daq='fl1user2', silent=False)[source]

The class generates multiindexed multidimensional pandas dataframes from the new FLASH dataformat resolved by both macro and microbunches alongside electrons.

property addChannels

Add new channels using a dict format defined by: “channel_name”: {

“format”: “per_pulse” | “per_train” | “per_electron”, “group_name”: “channel_group_path”, “slice”: “:”

}

property availableChannels

Returns the channel names that are available for use, excluding pulseId, defined by the json file

property channelsPerElectron

Returns a list of channels with per_electron format

property channelsPerPulse

Returns a list of channels with per_pulse format, including all auxillary channels

property channelsPerTrain

Returns a list of channels with per_train format

concatenateChannels(h5_file, format_=None)[source]

Returns a concatenated pandas DataFrame for either all pulse resolved or all electron resolved channels.

createDataframePerChannel(h5_file, channel)[source]

Returns a pandas DataFrame for a given channel name for a given file. The Dataframe contains the MultiIndex and returns depending on the channel’s format

createDataframePerFile(file_path)[source]

Returns two pandas DataFrames constructed for the given file. The DataFrames contains the datasets from the iterable in the order opposite to specified by channel names. One DataFrame is pulse resolved and the other electron resolved.

createMultiIndexPerElectron(h5_file)[source]

Creates an index per electron using pulseId for usage with the electron resolved pandas dataframe

createMultiIndexPerPulse(train_id, np_array)[source]

Creates an index per pulse using a pulse resovled channel’s macrobunch ID, for usage with the pulse resolved pandas dataframe

createNumpyArrayPerChannel(h5_file, channel)[source]

Returns a numpy Array for a given channel name for a given file

fillNA()[source]

Routine to fill the NaN values with intrafile forward filling.

h5_to_parquet(h5, prq)[source]
readData(runs=None, ignore_missing_runs=False, settings=None, channels=None, beamtime_dir=None, parquet_path=None, beamtime_id=None, year=None, daq='fl1user2')[source]

Read express data from DAQ, generating a parquet in between.

Args:
runs: int | list

run number or list of run numbers to load

ignore_missing_runs: bool

if False, rises FileNotFoundError in case files for a run are not available.

settings: str | path

pointer to the ini settings file, handeled by the dldProcessor class. It can be the name of a default settings file found in the settings dir of the repo, or the path to a specific settings file.

channels: list

list of channel names to include in the dataframe. if none defaults to all available channels

beamtime_dir: str | path

path to the raaw data. If none, its inferred from the settings file

parquet_path: str | path

path relative to beamtime_dir where to storethe parquet files. Defaults to “beamtime_dir/processed/parquet”

beamtime_id: int

the id of the beamtime. If none it is inferred from settings

year: int

the year of the beamtime. If none it is inferred from settings

daq: str

the daq containig the data. If none it is inferred from settings

returns:
prc: DldProcessor

returns an instance of the processor class, with electron and pulse dataframes loaded.

property removeChannels

Removes the unnecessary channels from the available channels using list of channels to remove

resetMultiIndex()[source]

Resets the index per pulse and electron

runFilesNames(run_number, daq, raw_data_dir)[source]

Returns all filenames of given run located in directory for the given daq.