blueetl.parallel¶

Parallelization utilities.

Functions

`call_by_simulation`(simulations, ...[, how])	Execute the given function in parallel, one task for each simulation.
`merge_filter`(df_list, groupby, func)	Merge the specified columns of the list of DataFrames, and call func for each combination.
`merge_groupby`(df_list, groupby)	Merge a list of DataFrames, group by the given keys, and yield keys and groups.

blueetl.parallel.call_by_simulation(simulations: DataFrame, dataframes_to_filter: dict[str, DataFrame], func: Callable, how: str = 'dataframe') → list[Any]¶

Execute the given function in parallel, one task for each simulation.

Parameters:

simulations – DataFrame of simulations.
dataframes_to_filter – dict of DataFrames to filter by simulation_id and/or circuit_id, and passed to each subprocess.
func – callable called for each simulation, accepting: - simulation_row: NamedTuple (or the type specified with the how parameter) - filtered_dataframes: dict of DataFrames filtered by simulation_id and/or circuit_id If the function has other parameters, they can be applied using functools.partials, and they will be serialized and passed unchanged to the subprocesses.
jobs – number of jobs (see run_parallel)
backend – parallel backend (see run_parallel)
how – format the simulation_row parameter passed to the func callback. It can be one of “namespace”, “namedtuple”, “dict”, “series”, “dataframe”.

Returns:

list of results

blueetl.parallel.merge_filter(df_list: list[DataFrame], groupby: list[str], func: Callable[[int, NamedTuple, list[DataFrame]], Any]) → list¶

Merge the specified columns of the list of DataFrames, and call func for each combination.

The merge operation is similar to a SQL left outer join.

Parameters:

df_list – list of DataFrames.
groupby – list of columns to consider across the DataFrames.
func –
callback executed for each calculated combination of columns, with parameters:
- task_index (int): task index.
- key (NamedTuple): key used to filter the DataFrames passed to each function call.
- df_list (list[pd.DataFrames]): list of DataFrames filtered by key.

Returns:

list of values returned by the callback function.

blueetl.parallel.merge_groupby(df_list: list[DataFrame], groupby: list[str]) → Iterator[tuple[NamedTuple, DataFrame]]¶

Merge a list of DataFrames, group by the given keys, and yield keys and groups.

The merge operation is similar to a SQL left outer join, but the dataframes are filtered in the main process and merged in subprocesses.