blueetl.analysis

Analysis functions.

Functions

run_from_file(analysis_config_file[, seed, ...])

Initialize and return the MultiAnalyzer.

Classes

Analyzer(analysis_config, repo, features)

Analyzer class.

MultiAnalyzer(global_config[, analyzers])

MultiAnalyzer class.

class blueetl.analysis.Analyzer(analysis_config: SingleAnalysisConfig, repo: Repository, features: FeaturesCollection)

Bases: object

Analyzer class.

Initialize the object.

Parameters:
  • analysis_config – analysis configuration.

  • repo – Repository instance.

  • features – FeaturesCollection instance.

property analysis_config: SingleAnalysisConfig

Return the wrapped analysis configuration.

apply_filter(simulations_filter: dict[str, Any] | None = None) Analyzer

Return a new object where the in memory filter is applied to repo and features.

Before applying the filter, all the repo dataframes are extracted, and all the features dataframes are calculated, if not already done.

Parameters:

simulations_filter – optional simulations filter; if not specified, use simulations_filter_in_memory from the configuration; if neither is specified, return the original object.

calculate_features() None

Calculate all the features defined in the configuration.

close() None

Invalidate and unlock the cache.

After calling this method, the DataFrames already extracted can still be accessed, but it’s not possible to extract new data or calculate new features.

extract_repo() None

Extract all the repositories dataframes.

property extraction: Repository

Return the wrapped repository as an alias.

property features: FeaturesCollection

Return the wrapped features.

classmethod from_config(analysis_config: SingleAnalysisConfig, simulations_config: SimulationCampaign, resolver: Resolver) Analyzer

Initialize the Analyzer from the given configuration.

Parameters:
  • analysis_config – analysis configuration.

  • simulations_config – simulation campaign configuration.

  • resolver – resolver instance.

property repo: Repository

Return the wrapped repository.

show()

Print all the DataFrames.

try_one(groupby: list[str]) tuple[NamedTuple, DataFrame]

Return the first key and df when grouping spikes by the given list of columns.

The returned parameters are the same passed to the feature function defined by the user.

It should be used only for internal use and debug.

Parameters:

groupby – list of columns to group by.

Returns:

The first key and df.

class blueetl.analysis.MultiAnalyzer(global_config: MultiAnalysisConfig, analyzers: dict[str, Analyzer] | None = None)

Bases: object

MultiAnalyzer class.

Initialize the object.

Parameters:
  • global_config – analysis configuration.

  • analyzers – dict of analyzers, or None to load them from the configuration.

property analyzers: dict[str, Analyzer]

Return the dict of analyzers.

apply_filter(simulations_filter: dict[str, Any] | None = None) MultiAnalyzer

Return a new object where the in memory filter is applied to repo and features.

Before applying the filter, all the repo dataframes are extracted, and all the features dataframes are calculated, if not already done.

Parameters:

simulations_filter – optional simulations filter; if not specified, use simulations_filter_in_memory from the configuration; if neither is specified, return the original object.

calculate_features() None

Calculate all the features defined in the configuration for all the analysis.

close() None

Invalidate and unlock the cache.

After calling this method, the DataFrames already extracted can still be accessed, but it’s not possible to extract new data or calculate new features.

extract_repo() None

Extract all the repositories dataframes for all the analysis.

classmethod from_config(global_config: dict, base_path: str | PathLike, extra_params: dict[str, Any] | None = None) MultiAnalyzer

Initialize the MultiAnalyzer from the given configuration.

Parameters:
  • global_config – analysis configuration.

  • base_path – base path used to resolve relative paths in the configuration.

  • extra_params – dict of overriding parameters.

classmethod from_file(path: str | PathLike, extra_params: dict[str, Any] | None = None) MultiAnalyzer

Return a new instance loaded using the given configuration file.

property global_config: MultiAnalysisConfig

Return the global config instance.

property names: list[str]

Return the names of all the analyzers.

show()

Print all the DataFrames.

blueetl.analysis.run_from_file(analysis_config_file: str | PathLike, seed: int | None = 0, extract: bool = True, calculate: bool = True, show: bool = False, clear_cache: bool | None = None, readonly_cache: bool | None = None, skip_features_cache: bool | None = None, loglevel: int | None = None) MultiAnalyzer

Initialize and return the MultiAnalyzer.

Parameters:
  • analysis_config_file – path to the analysis configuration file.

  • seed – if not None, random seed used to select random gids.

  • extract – if True, run the extraction of the repository.

  • calculate – if True, run the calculation of the features.

  • show – if True, show a short representation of all the Pandas DataFrames, mainly for debug.

  • clear_cache – if None, use the value from the configuration file. Otherwise: if True, remove any existing cache; if False, reuse the existing cache if possible.

  • readonly_cache – if None, use the value from the configuration file. Otherwise: if True, use the existing cache if possible, or raise an error; if False, use the existing cache if possible, or update it.

  • skip_features_cache – if None, use the value from the configuration file. Otherwise: if True, do not write the features to the cache; if False, write the features to the cache after calculating them.

  • loglevel – if specified, used to set up logging.

Returns:

a new MultiAnalyzer instance.