02 Features basics¶

Let’s start with an existing analysis configuration, that we copy into a temporary location to be used as working directory.

from pathlib import Path
import tempfile
from blueetl.utils import copy_config

workdir = tempfile.TemporaryDirectory(suffix="_blueetl")
workdir_path = Path(workdir.name)

config_file = workdir_path / "config.yaml"
copy_config("../data/analysis/config.yaml", config_file)

# print(config_file)
# print(config_file.read_text())

We can now initialize a MultiAnalyzer object with the following code, where you can specify different parameters if needed:

from blueetl.analysis import run_from_file

ma = run_from_file(
    config_file,
    extract=False,
    calculate=False,
    show=False,
    clear_cache=True,
    loglevel="ERROR",
)
print(ma)

<blueetl.analysis.MultiAnalyzer object at 0x7f65784bfe90>

Since we passed extract=False to the previous call, we have to extract the repository explicitly:

ma.extract_repo()

And since we passed calculate=False to the previous call, we have to calculate the features explicitly:

ma.calculate_features()

We can now inspect the list of analyses in the MultiAnalyzer object:

ma.names

['spikes']

and access each of them as an Analyzer object:

ma.spikes

<blueetl.analysis.Analyzer at 0x7f6592b41250>

Each Analyzer object provides two special attributes: repo and features, that can be used to access the extracted data and the calculated features.

You can inspect the list of extracted and calculated DataFrames calling names on them, as shown below:

ma.spikes.repo.names

['simulations', 'neurons', 'neuron_classes', 'windows', 'report']

ma.spikes.features.names

['by_gid',
 'by_gid_0_0__0',
 'by_gid_0_0__1',
 'by_gid_0_1__0',
 'by_gid_0_1__1',
 'by_gid_1_0__0',
 'by_gid_1_0__1',
 'by_gid_1_1__0',
 'by_gid_1_1__1',
 'by_gid_2_0__0',
 'by_gid_2_0__1',
 'by_gid_2_1__0',
 'by_gid_2_1__1',
 'by_gid_and_trial',
 'by_gid_and_trial_0_0__0',
 'by_gid_and_trial_0_0__1',
 'by_gid_and_trial_0_1__0',
 'by_gid_and_trial_0_1__1',
 'by_gid_and_trial_1_0__0',
 'by_gid_and_trial_1_0__1',
 'by_gid_and_trial_1_1__0',
 'by_gid_and_trial_1_1__1',
 'by_gid_and_trial_2_0__0',
 'by_gid_and_trial_2_0__1',
 'by_gid_and_trial_2_1__0',
 'by_gid_and_trial_2_1__1',
 'by_neuron_class',
 'by_neuron_class_0_0__0',
 'by_neuron_class_0_0__1',
 'by_neuron_class_0_1__0',
 'by_neuron_class_0_1__1',
 'by_neuron_class_1_0__0',
 'by_neuron_class_1_0__1',
 'by_neuron_class_1_1__0',
 'by_neuron_class_1_1__1',
 'by_neuron_class_2_0__0',
 'by_neuron_class_2_0__1',
 'by_neuron_class_2_1__0',
 'by_neuron_class_2_1__1',
 'by_neuron_class_and_trial',
 'by_neuron_class_and_trial_0_0__0',
 'by_neuron_class_and_trial_0_0__1',
 'by_neuron_class_and_trial_0_1__0',
 'by_neuron_class_and_trial_0_1__1',
 'by_neuron_class_and_trial_1_0__0',
 'by_neuron_class_and_trial_1_0__1',
 'by_neuron_class_and_trial_1_1__0',
 'by_neuron_class_and_trial_1_1__1',
 'by_neuron_class_and_trial_2_0__0',
 'by_neuron_class_and_trial_2_0__1',
 'by_neuron_class_and_trial_2_1__0',
 'by_neuron_class_and_trial_2_1__1',
 'histograms',
 'histograms_0_0__0',
 'histograms_0_0__1',
 'histograms_0_1__0',
 'histograms_0_1__1',
 'histograms_1_0__0',
 'histograms_1_0__1',
 'histograms_1_1__0',
 'histograms_1_1__1',
 'histograms_2_0__0',
 'histograms_2_0__1',
 'histograms_2_1__0',
 'histograms_2_1__1']

You can access the wrapped DataFrames using the df attribute on each object:

ma.spikes.repo.report.df

	time	gid	window	trial	simulation_id	circuit_id	neuron_class
0	22.300	300	w1	0	0	0	VPL_INH
1	24.400	226	w1	0	0	0	VPL_INH
2	25.100	267	w1	0	0	0	VPL_INH
3	26.475	308	w1	0	0	0	VPL_INH
4	40.375	291	w1	0	0	0	VPL_INH
...	...	...	...	...	...	...	...
59	11.400	205	w2	2	1	0	VPL_INH
60	11.550	291	w2	2	1	0	VPL_INH
61	11.800	163	w2	2	1	0	VPL_INH
62	12.675	89	w2	2	1	0	VPL_INH
63	13.550	350	w2	2	1	0	VPL_INH

64 rows × 7 columns

The DataFrames of features can be accessed in the same way:

ma.spikes.features.by_neuron_class_0_0__0.df

				mean_of_mean_spike_counts	mean_of_mean_firing_rates_per_second	std_of_mean_firing_rates_per_second	mean_of_spike_times_normalised_hist_1ms_bin	min_of_spike_times_normalised_hist_1ms_bin	max_of_spike_times_normalised_hist_1ms_bin	argmax_spike_times_hist_1ms_bin
simulation_id	circuit_id	neuron_class	window
0	0	Rt_INH	w1	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0
		Rt_INH	w2	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0
		VPL_EXC	w1	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0
		VPL_EXC	w2	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0
		VPL_INH	w1	1.000000	14.285714	0.000000	0.014286	0.0	0.2	20
		VPL_INH	w2	0.733333	12.222222	5.443311	0.012222	0.0	0.1	2
1	0	Rt_INH	w1	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0
		Rt_INH	w2	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0
		VPL_EXC	w1	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0
		VPL_EXC	w2	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0
		VPL_INH	w1	1.000000	14.285714	0.000000	0.014286	0.0	0.3	21
		VPL_INH	w2	0.733333	12.222222	5.443311	0.012222	0.0	0.1	1

and in this case also the attrs dictionary attached to the DataFrame is populated with the parameters used for the computation:

ma.spikes.features.by_neuron_class_0_0__0.df.attrs

{'config': {'type': 'multi',
  'name': None,
  'groupby': ['simulation_id', 'circuit_id', 'neuron_class', 'window'],
  'function': 'blueetl.external.bnac.calculate_features.calculate_features_multi',
  'neuron_classes': [],
  'windows': [],
  'params': {'export_all_neurons': True,
   'ratio': 0.25,
   'nested_example': {'params': {'bin_size': 1}},
   'param1': 10,
   'param2': 11},
  'params_product': {},
  'params_zip': {},
  'suffix': '_0_0__0',
  'multi_index': True}}

The parameters have been automatically calculated combining params, params_product, and params_zip from the original configuration.

In this case, it may be convenient to access a single DataFrame contaning the concatenation of the features of the same type, where the varying parameters are added as new columns.

The name of the DataFrame is the same as the split DataFrames, without the suffix:

ma.spikes.features.by_neuron_class.df

				mean_of_mean_spike_counts	mean_of_mean_firing_rates_per_second	std_of_mean_firing_rates_per_second	mean_of_spike_times_normalised_hist_1ms_bin	min_of_spike_times_normalised_hist_1ms_bin	max_of_spike_times_normalised_hist_1ms_bin	argmax_spike_times_hist_1ms_bin	params_id	ratio	bin_size	param1	param2
simulation_id	circuit_id	neuron_class	window
0	0	Rt_INH	w1	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0	0	0.25	1	10	11
		Rt_INH	w2	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0	0	0.25	1	10	11
		VPL_EXC	w1	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0	0	0.25	1	10	11
		VPL_EXC	w2	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0	0	0.25	1	10	11
		VPL_INH	w1	1.000000	14.285714	0.000000	0.014286	0.0	0.2	20	0	0.25	1	10	11
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1	0	Rt_INH	w2	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0	11	0.75	2	20	21
		VPL_EXC	w1	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0	11	0.75	2	20	21
		VPL_EXC	w2	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0	11	0.75	2	20	21
		VPL_INH	w1	1.000000	14.285714	0.000000	0.014286	0.0	0.3	21	11	0.75	2	20	21
		VPL_INH	w2	0.733333	12.222222	5.443311	0.012222	0.0	0.1	1	11	0.75	2	20	21

144 rows × 12 columns

Note that the column names in the previous DataFrame have been shortened. You can see the full names in the aliases DataFrame:

ma.spikes.features.by_neuron_class.aliases

	column	alias
0	ratio	ratio
1	nested_example.params.bin_size	bin_size
2	param1	param1
3	param2	param2

You can also inspect all the parameters that were used for the computation, accessing the params attribute:

ma.spikes.features.by_neuron_class.params

	export_all_neurons	ratio	nested_example.params.bin_size	param1	param2
params_id
0	True	0.25	1	10	11
1	True	0.25	1	20	21
2	True	0.25	2	10	11
3	True	0.25	2	20	21
4	True	0.50	1	10	11
5	True	0.50	1	20	21
6	True	0.50	2	10	11
7	True	0.50	2	20	21
8	True	0.75	1	10	11
9	True	0.75	1	20	21
10	True	0.75	2	10	11
11	True	0.75	2	20	21

During the extraction and computation, some files have been created to be used as cache.

Usually you don’t need to access them directly, and if they are deleted they will be created again at the next run.

They may be automatically deleted when the cache is invalidated.

!cd {workdir.name} && tree

.
├── analysis_output
│   └── spikes
│       ├── config
│       │   ├── analysis_config.cached.yaml
│       │   ├── checksums.cached.yaml
│       │   └── simulations_config.cached.yaml
│       ├── features
│       │   ├── by_gid_0_0__0.parquet
│       │   ├── by_gid_0_0__1.parquet
│       │   ├── by_gid_0_1__0.parquet
│       │   ├── by_gid_0_1__1.parquet
│       │   ├── by_gid_1_0__0.parquet
│       │   ├── by_gid_1_0__1.parquet
│       │   ├── by_gid_1_1__0.parquet
│       │   ├── by_gid_1_1__1.parquet
│       │   ├── by_gid_2_0__0.parquet
│       │   ├── by_gid_2_0__1.parquet
│       │   ├── by_gid_2_1__0.parquet
│       │   ├── by_gid_2_1__1.parquet
│       │   ├── by_gid_and_trial_0_0__0.parquet
│       │   ├── by_gid_and_trial_0_0__1.parquet
│       │   ├── by_gid_and_trial_0_1__0.parquet
│       │   ├── by_gid_and_trial_0_1__1.parquet
│       │   ├── by_gid_and_trial_1_0__0.parquet
│       │   ├── by_gid_and_trial_1_0__1.parquet
│       │   ├── by_gid_and_trial_1_1__0.parquet
│       │   ├── by_gid_and_trial_1_1__1.parquet
│       │   ├── by_gid_and_trial_2_0__0.parquet
│       │   ├── by_gid_and_trial_2_0__1.parquet
│       │   ├── by_gid_and_trial_2_1__0.parquet
│       │   ├── by_gid_and_trial_2_1__1.parquet
│       │   ├── by_neuron_class_0_0__0.parquet
│       │   ├── by_neuron_class_0_0__1.parquet
│       │   ├── by_neuron_class_0_1__0.parquet
│       │   ├── by_neuron_class_0_1__1.parquet
│       │   ├── by_neuron_class_1_0__0.parquet
│       │   ├── by_neuron_class_1_0__1.parquet
│       │   ├── by_neuron_class_1_1__0.parquet
│       │   ├── by_neuron_class_1_1__1.parquet
│       │   ├── by_neuron_class_2_0__0.parquet
│       │   ├── by_neuron_class_2_0__1.parquet
│       │   ├── by_neuron_class_2_1__0.parquet
│       │   ├── by_neuron_class_2_1__1.parquet
│       │   ├── by_neuron_class_and_trial_0_0__0.parquet
│       │   ├── by_neuron_class_and_trial_0_0__1.parquet
│       │   ├── by_neuron_class_and_trial_0_1__0.parquet
│       │   ├── by_neuron_class_and_trial_0_1__1.parquet
│       │   ├── by_neuron_class_and_trial_1_0__0.parquet
│       │   ├── by_neuron_class_and_trial_1_0__1.parquet
│       │   ├── by_neuron_class_and_trial_1_1__0.parquet
│       │   ├── by_neuron_class_and_trial_1_1__1.parquet
│       │   ├── by_neuron_class_and_trial_2_0__0.parquet
│       │   ├── by_neuron_class_and_trial_2_0__1.parquet
│       │   ├── by_neuron_class_and_trial_2_1__0.parquet
│       │   ├── by_neuron_class_and_trial_2_1__1.parquet
│       │   ├── histograms_0_0__0.parquet
│       │   ├── histograms_0_0__1.parquet
│       │   ├── histograms_0_1__0.parquet
│       │   ├── histograms_0_1__1.parquet
│       │   ├── histograms_1_0__0.parquet
│       │   ├── histograms_1_0__1.parquet
│       │   ├── histograms_1_1__0.parquet
│       │   ├── histograms_1_1__1.parquet
│       │   ├── histograms_2_0__0.parquet
│       │   ├── histograms_2_0__1.parquet
│       │   ├── histograms_2_1__0.parquet
│       │   └── histograms_2_1__1.parquet
│       └── repo
│           ├── neuron_classes.parquet
│           ├── neurons.parquet
│           ├── report.parquet
│           ├── simulations.parquet
│           └── windows.parquet
└── config.yaml

5 directories, 69 files

You can remove the full working directory if you don’t need it anymore:

workdir.cleanup()