Usage¶
Core Transformations¶
To use the Core Transformations provided by the .etl
accessor with any Pandas DataFrame or Series, it’s enough to import BlueETL and call the desired methods.
For example:
import blueetl
import pandas as pd
df = pd.DataFrame({"a": [0, 1, 2], "b": [3, 4, 5]})
df = df.etl.q(a=1)
See the Jupyter notebook 01 Core transformations for more information and examples.
Analysis of reports¶
Basic usage¶
To perform the analysis of reports across multiple simulations in a given simulation campaign, a configuration file needs to be provided.
The configuration file should specify in particular:
simulation_campaign
: path to the json configuration file of the Simulation Campaign produced by bbp-workflow.output
: path to the output directory where the results are stored. It’s also used as cache, and existing stale files may be automatically deleted.analysis
: dictionary containing a key for each report to be analyzed.
and for each report:
extraction
: configuration dictionary used to extract data from the report, by window and neuron class.features
: list of configuration dictionaries used to calculate the features.
See the Configuration page for full reference and examples.
A simple way to initialize a MultiAnalyzer object from the configuration file in your code is:
from blueetl.analysis import run_from_file
ma = run_from_file("analysis_config.yaml", loglevel="INFO")
The code above will automatically execute the extraction of the report data, and the calculation of the features.
If you prefer to execute the extraction and calculation steps manually, you could use instead:
from blueetl.analysis import run_from_file
ma = run_from_file("analysis_config.yaml", loglevel="INFO", extract=False, calculate=False)
You can also specify other parameters:
seed
(int): to set a specific seed, orNone
if you don’t want to initialize the random number generator used to select random neurons.clear_cache
(bool):True
orFalse
to force clearing or keeping any existing cache, regardless of the value in the configuration file.show
(bool):True
to print a short representation of all the DataFrames, sometimes useful for a quick inspection.
If not already done automatically with the initialization code above, you can execute the extraction of the data from the report and the calculation of the features with:
ma.extract_repo()
ma.calculate_features()
In case of spikes report, the resulting dataframes will be accessible as:
ma.spikes.repo.simulations.df
ma.spikes.repo.neurons.df
ma.spikes.repo.neuron_classes.df
ma.spikes.repo.trial_steps.df
ma.spikes.repo.windows.df
ma.spikes.repo.report.df
ma.spikes.features.<custom_name_1>.df
ma.spikes.features.<custom_name_2>.df
...
The list of the available names of the reports can be obtained with:
ma.names
The list of the available names of the dataframes can be obtained with:
ma.spikes.repo.names
ma.spikes.features.names
Command Line Interface¶
BlueETL includes a simple CLI providing a few subcommands:
$ blueetl --help
Usage: blueetl [OPTIONS] COMMAND [ARGS]...
The CLI entry point.
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
run Run the analysis.
migrate-config Migrate a configuration file.
validate-config Validate a configuration file.
convert-spikes Convert spikes in CSV format.
To extract and calculate features without writing additional code, you can use the run
subcommand:
$ blueetl run --help
Usage: blueetl run [OPTIONS] ANALYSIS_CONFIG_FILE
Run the analysis.
Options:
--seed INTEGER Pseudo-random generator seed [default: 0]
--extract / --no-extract Extract (or load from the cache) the
repository.
--calculate / --no-calculate Calculate (or load from the cache) the
features.
--show / --no-show Show repository and features dataframes.
--clear-cache / --no-clear-cache
If True, force clearing the cache.
--readonly-cache / --no-readonly-cache
If True, use the existing cache if possible,
or raise an error if not.
--skip-features-cache / --no-skip-features-cache
If True, do not write the features to the
cache.
-i, --interactive / --no-interactive
Start an interactive IPython shell.
-v, --verbose -v for INFO, -vv for DEBUG
--help Show this message and exit.
To validate the configuration file without running the analysis, you can use the validate-config
subcommand:
$ blueetl validate-config --help
Usage: blueetl validate-config [OPTIONS] ANALYSIS_CONFIG_FILE
Validate a configuration file.
Options:
--help Show this message and exit.
To migrate an old configuration, you can use the migrate-config
subcommand:
$ blueetl migrate-config --help
Usage: blueetl migrate-config [OPTIONS] INPUT_CONFIG_FILE OUTPUT_CONFIG_FILE
Migrate a configuration file.
Options:
--sort / --no-sort Sort the root keys. [default: sort]
--help Show this message and exit.
Output and caching¶
The extracted dataframes are saved into the configured output directory.
Warning
It is important to understand the caching strategy. The cache can be manually deleted to ensure that everything is recalculated from scratch.
The dataframes are automatically loaded and used as cache if the MultiAnalyzer object is recreated using the same configuration, or they may be automatically deleted and rebuilt if the configuration has changed.
If only some parts of the configuration have changed, only the invalid dataframes are deleted and rebuilt.
In particular, given this ordered list of extracted dataframes:
simulations
neurons
neuron_classes
trial_steps
windows
report
all the features dataframes
these rules apply:
If the Simulation Campaign configuration specified by
simulation_campaign
changed, all the dataframes are rebuilt.If any of
neuron_classes
,limit
,target
changed in theextraction
section of the configuration, then theneurons
dataframe and all the following are rebuilt.If any of
windows
andtrial_steps
changed in theextraction
section of the configuration, then thetrial_steps
dataframe and all the following are rebuilt.If a feature configuration changed in the
features
section of the configuration, then the corresponding dataframes are rebuilt.If a feature configuration has been removed from the
features
section of the configuration, then the corresponding dataframes are deleted.If a feature configuration is unchanged, then the corresponding dataframes are loaded from the cache, regardless of any change in the python function.
Because of this, if you changed the logic of the function, you may need to manually delete the cached dataframes.
When simulations_filter
is specified in the configuration:
If the new filter is narrower or equal to the filter used to generate the old cache, then the old cache is used to produce the new filtered dataframes, and the cache is replaced if different.
If the new filter is broader than the filter used to generate the old cache, then the old cache is deleted and rebuilt.
Examples of narrower and broader filters:
the filter
{"key": 1}
is narrower than{"key": [1, 2]}
the filter
{"key": {"lt": 3}}
is narrower than{"key": {"lt": 4}}
the filter
{"key": {"le": 3, "ge": 1}}
is narrower than{"key": {"le": 4}}