blueetl.cache

Cache Manager.

Classes

CacheManager(analysis_config, simulations_config)

Cache Manager.

CoupledCache(cached, actual)

Container of cached and actual configurations.

DummyLockManager(*args, **kwargs)

Dummy Lock Manager.

LockManager(path)

Lock Manager.

LockManagerProtocol(*args, **kwargs)

Lock Manager Interface.

Exceptions

CacheError

Cache error raised when a read-only cache is written.

exception blueetl.cache.CacheError

Bases: Exception

Cache error raised when a read-only cache is written.

class blueetl.cache.CacheManager(analysis_config: SingleAnalysisConfig, simulations_config: SimulationCampaign)

Bases: object

Cache Manager.

Initialize the object.

Parameters:
  • analysis_config – analysis configuration.

  • simulations_config – simulations campaign configuration.

close() None

Close the cache manager and unlock the lock directory.

After calling this method, the Cache Manager instance shouldn’t be used anymore.

dump_features(features_dict: dict[str, DataFrame], features_config: FeaturesConfig) None

Write features dataframes to the cache.

The cache key is determined by the hash of features_config.

Parameters:
  • features_dict – dict of features to be written.

  • features_config – configuration dict of the features to be written.

dump_repo(df: DataFrame, name: str) None

Write a specific repo dataframe to the cache.

Parameters:
  • df – dataframe to be saved.

  • name – name of the repo dataframe.

features_cache_needs_filter(features_config: FeaturesConfig) bool

Return True if the cached features need to be filtered.

This happens when the cache is used, but the actual filter is more specific than the cached filter.

get_cached_features_checksums(features_config: FeaturesConfig) dict[str, dict[str, str]]

Return the cached features checksums, or an empty dict if the cache doesn’t exist.

is_repo_cached(name: str) bool

Return whether a specific repo dataframe is present in the cache.

load_features(features_config: FeaturesConfig) dict[str, DataFrame] | None

Load features dataframes from the cache.

The cache key is determined by the hash of features_config.

Parameters:

features_config – configuration dict of the features to be loaded.

Returns:

Dict of dataframes, or None if they are not cached.

load_repo(name: str) DataFrame | None

Load a specific repo dataframe from the cache.

Parameters:

name – name of the repo dataframe.

Returns:

The loaded dataframe, or None if it’s not cached.

property locked: bool

Return True if the cache manager is locking the cache, False otherwise.

property readonly: bool

Return True if the cache manager is set to read-only, False otherwise.

repo_cache_needs_filter(name: str) bool

Return True if the cached repo needs to be filtered.

This happens when the cache is used, but the actual filter is more specific than the cached filter.

to_readonly() CacheManager

Return a read-only copy of the object.

The returned object will raise an exception if any writing method is called.

class blueetl.cache.CoupledCache(cached: ConfigT | None, actual: ConfigT)

Bases: Generic[ConfigT]

Container of cached and actual configurations.

class blueetl.cache.DummyLockManager(*args, **kwargs)

Bases: LockManagerProtocol

Dummy Lock Manager.

lock(mode: int) None

Pretend to lock.

property locked: bool

Always return True.

unlock() None

Pretend to unlock.

class blueetl.cache.LockManager(path: PathLike)

Bases: LockManagerProtocol

Lock Manager.

On Linux, the flock call is handled locally, and the underlying filesystem (GPFS) does not get any notification that locks are being set. Therefore, GPFS cannot enforce locks across nodes.

Initialize the object.

Parameters:

path – path to an existing directory to be used for locking.

lock(mode: int) None

Lock the directory.

Parameters:

mode – LockManager.LOCK_EX for exclusive lock, or LockManager.LOCK_SH for shared lock.

property locked: bool

Return True if the lock manager is locking the cache, False otherwise.

unlock() None

Unlock the directory.

class blueetl.cache.LockManagerProtocol(*args, **kwargs)

Bases: Protocol

Lock Manager Interface.

abstract lock(mode: int) None

Lock.

abstract property locked: bool

Return the lock status.

abstract unlock() None

Unlock.