blueetl.parallel¶
Parallelization utilities.
Functions
|
Execute the given function in parallel, one task for each simulation. |
|
Merge the specified columns of the list of DataFrames, and call func for each combination. |
|
Merge a list of DataFrames, group by the given keys, and yield keys and groups. |
- blueetl.parallel.call_by_simulation(simulations: DataFrame, dataframes_to_filter: dict[str, DataFrame], func: Callable, how: str = 'dataframe') list[Any] ¶
Execute the given function in parallel, one task for each simulation.
- Parameters:
simulations – DataFrame of simulations.
dataframes_to_filter – dict of DataFrames to filter by simulation_id and/or circuit_id, and passed to each subprocess.
func – callable called for each simulation, accepting: - simulation_row: NamedTuple (or the type specified with the how parameter) - filtered_dataframes: dict of DataFrames filtered by simulation_id and/or circuit_id If the function has other parameters, they can be applied using functools.partials, and they will be serialized and passed unchanged to the subprocesses.
jobs – number of jobs (see run_parallel)
backend – parallel backend (see run_parallel)
how – format the simulation_row parameter passed to the func callback. It can be one of “namespace”, “namedtuple”, “dict”, “series”, “dataframe”.
- Returns:
list of results
- blueetl.parallel.merge_filter(df_list: list[DataFrame], groupby: list[str], func: Callable[[int, NamedTuple, list[DataFrame]], Any]) list ¶
Merge the specified columns of the list of DataFrames, and call func for each combination.
The merge operation is similar to a SQL left outer join.
- Parameters:
df_list – list of DataFrames.
groupby – list of columns to consider across the DataFrames.
func –
callback executed for each calculated combination of columns, with parameters:
task_index (int): task index.
key (NamedTuple): key used to filter the DataFrames passed to each function call.
df_list (list[pd.DataFrames]): list of DataFrames filtered by key.
- Returns:
list of values returned by the callback function.
- blueetl.parallel.merge_groupby(df_list: list[DataFrame], groupby: list[str]) Iterator[tuple[NamedTuple, DataFrame]] ¶
Merge a list of DataFrames, group by the given keys, and yield keys and groups.
The merge operation is similar to a SQL left outer join, but the dataframes are filtered in the main process and merged in subprocesses.