blueetl.parallel

Parallelization utilities.

Functions

call_by_simulation(simulations, ...[, how])

Execute the given function in parallel, one task for each simulation.

merge_filter(df_list, groupby, func)

Merge the specified columns of the list of DataFrames, and call func for each combination.

merge_groupby(df_list, groupby)

Merge a list of DataFrames, group by the given keys, and yield keys and groups.

blueetl.parallel.call_by_simulation(simulations: DataFrame, dataframes_to_filter: dict[str, DataFrame], func: Callable, how: str = 'dataframe') list[Any]

Execute the given function in parallel, one task for each simulation.

Parameters:
  • simulations – DataFrame of simulations.

  • dataframes_to_filter – dict of DataFrames to filter by simulation_id and/or circuit_id, and passed to each subprocess.

  • func – callable called for each simulation, accepting: - simulation_row: NamedTuple (or the type specified with the how parameter) - filtered_dataframes: dict of DataFrames filtered by simulation_id and/or circuit_id If the function has other parameters, they can be applied using functools.partials, and they will be serialized and passed unchanged to the subprocesses.

  • jobs – number of jobs (see run_parallel)

  • backend – parallel backend (see run_parallel)

  • how – format the simulation_row parameter passed to the func callback. It can be one of “namespace”, “namedtuple”, “dict”, “series”, “dataframe”.

Returns:

list of results

blueetl.parallel.merge_filter(df_list: list[DataFrame], groupby: list[str], func: Callable[[int, NamedTuple, list[DataFrame]], Any]) list

Merge the specified columns of the list of DataFrames, and call func for each combination.

The merge operation is similar to a SQL left outer join.

Parameters:
  • df_list – list of DataFrames.

  • groupby – list of columns to consider across the DataFrames.

  • func

    callback executed for each calculated combination of columns, with parameters:

    • task_index (int): task index.

    • key (NamedTuple): key used to filter the DataFrames passed to each function call.

    • df_list (list[pd.DataFrames]): list of DataFrames filtered by key.

Returns:

list of values returned by the callback function.

blueetl.parallel.merge_groupby(df_list: list[DataFrame], groupby: list[str]) Iterator[tuple[NamedTuple, DataFrame]]

Merge a list of DataFrames, group by the given keys, and yield keys and groups.

The merge operation is similar to a SQL left outer join, but the dataframes are filtered in the main process and merged in subprocesses.