blueetl.analysis¶
Analysis functions.
Functions
|
Initialize and return the MultiAnalyzer. |
Classes
|
Analyzer class. |
|
MultiAnalyzer class. |
- class blueetl.analysis.Analyzer(analysis_config: SingleAnalysisConfig, repo: Repository, features: FeaturesCollection)¶
Bases:
object
Analyzer class.
Initialize the object.
- Parameters:
analysis_config – analysis configuration.
repo – Repository instance.
features – FeaturesCollection instance.
- property analysis_config: SingleAnalysisConfig¶
Return the wrapped analysis configuration.
- apply_filter(simulations_filter: dict[str, Any] | None = None) Analyzer ¶
Return a new object where the in memory filter is applied to repo and features.
Before applying the filter, all the repo dataframes are extracted, and all the features dataframes are calculated, if not already done.
- Parameters:
simulations_filter – optional simulations filter; if not specified, use simulations_filter_in_memory from the configuration; if neither is specified, return the original object.
- calculate_features() None ¶
Calculate all the features defined in the configuration.
- close() None ¶
Invalidate and unlock the cache.
After calling this method, the DataFrames already extracted can still be accessed, but it’s not possible to extract new data or calculate new features.
- extract_repo() None ¶
Extract all the repositories dataframes.
- property extraction: Repository¶
Return the wrapped repository as an alias.
- property features: FeaturesCollection¶
Return the wrapped features.
- classmethod from_config(analysis_config: SingleAnalysisConfig, simulations_config: SimulationCampaign, resolver: Resolver) Analyzer ¶
Initialize the Analyzer from the given configuration.
- Parameters:
analysis_config – analysis configuration.
simulations_config – simulation campaign configuration.
resolver – resolver instance.
- property repo: Repository¶
Return the wrapped repository.
- show()¶
Print all the DataFrames.
- try_one(groupby: list[str]) tuple[NamedTuple, DataFrame] ¶
Return the first key and df when grouping spikes by the given list of columns.
The returned parameters are the same passed to the feature function defined by the user.
It should be used only for internal use and debug.
- Parameters:
groupby – list of columns to group by.
- Returns:
The first key and df.
- class blueetl.analysis.MultiAnalyzer(global_config: MultiAnalysisConfig, analyzers: dict[str, Analyzer] | None = None)¶
Bases:
object
MultiAnalyzer class.
Initialize the object.
- Parameters:
global_config – analysis configuration.
analyzers – dict of analyzers, or None to load them from the configuration.
- apply_filter(simulations_filter: dict[str, Any] | None = None) MultiAnalyzer ¶
Return a new object where the in memory filter is applied to repo and features.
Before applying the filter, all the repo dataframes are extracted, and all the features dataframes are calculated, if not already done.
- Parameters:
simulations_filter – optional simulations filter; if not specified, use simulations_filter_in_memory from the configuration; if neither is specified, return the original object.
- calculate_features() None ¶
Calculate all the features defined in the configuration for all the analysis.
- close() None ¶
Invalidate and unlock the cache.
After calling this method, the DataFrames already extracted can still be accessed, but it’s not possible to extract new data or calculate new features.
- extract_repo() None ¶
Extract all the repositories dataframes for all the analysis.
- classmethod from_config(global_config: dict, base_path: str | PathLike, extra_params: dict[str, Any] | None = None) MultiAnalyzer ¶
Initialize the MultiAnalyzer from the given configuration.
- Parameters:
global_config – analysis configuration.
base_path – base path used to resolve relative paths in the configuration.
extra_params – dict of overriding parameters.
- classmethod from_file(path: str | PathLike, extra_params: dict[str, Any] | None = None) MultiAnalyzer ¶
Return a new instance loaded using the given configuration file.
- property global_config: MultiAnalysisConfig¶
Return the global config instance.
- property names: list[str]¶
Return the names of all the analyzers.
- show()¶
Print all the DataFrames.
- blueetl.analysis.run_from_file(analysis_config_file: str | PathLike, seed: int | None = 0, extract: bool = True, calculate: bool = True, show: bool = False, clear_cache: bool | None = None, readonly_cache: bool | None = None, skip_features_cache: bool | None = None, loglevel: int | None = None) MultiAnalyzer ¶
Initialize and return the MultiAnalyzer.
- Parameters:
analysis_config_file – path to the analysis configuration file.
seed – if not None, random seed used to select random gids.
extract – if True, run the extraction of the repository.
calculate – if True, run the calculation of the features.
show – if True, show a short representation of all the Pandas DataFrames, mainly for debug.
clear_cache – if None, use the value from the configuration file. Otherwise: if True, remove any existing cache; if False, reuse the existing cache if possible.
readonly_cache – if None, use the value from the configuration file. Otherwise: if True, use the existing cache if possible, or raise an error; if False, use the existing cache if possible, or update it.
skip_features_cache – if None, use the value from the configuration file. Otherwise: if True, do not write the features to the cache; if False, write the features to the cache after calculating them.
loglevel – if specified, used to set up logging.
- Returns:
a new MultiAnalyzer instance.