blueetl.cache¶
Cache Manager.
Classes
|
Cache Manager. |
|
Container of cached and actual configurations. |
|
Dummy Lock Manager. |
|
Lock Manager. |
|
Lock Manager Interface. |
Exceptions
Cache error raised when a read-only cache is written. |
- exception blueetl.cache.CacheError¶
Bases:
Exception
Cache error raised when a read-only cache is written.
- class blueetl.cache.CacheManager(analysis_config: SingleAnalysisConfig, simulations_config: SimulationCampaign)¶
Bases:
object
Cache Manager.
Initialize the object.
- Parameters:
analysis_config – analysis configuration.
simulations_config – simulations campaign configuration.
- close() None ¶
Close the cache manager and unlock the lock directory.
After calling this method, the Cache Manager instance shouldn’t be used anymore.
- dump_features(features_dict: dict[str, DataFrame], features_config: FeaturesConfig) None ¶
Write features dataframes to the cache.
The cache key is determined by the hash of features_config.
- Parameters:
features_dict – dict of features to be written.
features_config – configuration dict of the features to be written.
- dump_repo(df: DataFrame, name: str) None ¶
Write a specific repo dataframe to the cache.
- Parameters:
df – dataframe to be saved.
name – name of the repo dataframe.
- features_cache_needs_filter(features_config: FeaturesConfig) bool ¶
Return True if the cached features need to be filtered.
This happens when the cache is used, but the actual filter is more specific than the cached filter.
- get_cached_features_checksums(features_config: FeaturesConfig) dict[str, dict[str, str]] ¶
Return the cached features checksums, or an empty dict if the cache doesn’t exist.
- is_repo_cached(name: str) bool ¶
Return whether a specific repo dataframe is present in the cache.
- load_features(features_config: FeaturesConfig) dict[str, DataFrame] | None ¶
Load features dataframes from the cache.
The cache key is determined by the hash of features_config.
- Parameters:
features_config – configuration dict of the features to be loaded.
- Returns:
Dict of dataframes, or None if they are not cached.
- load_repo(name: str) DataFrame | None ¶
Load a specific repo dataframe from the cache.
- Parameters:
name – name of the repo dataframe.
- Returns:
The loaded dataframe, or None if it’s not cached.
- property locked: bool¶
Return True if the cache manager is locking the cache, False otherwise.
- property readonly: bool¶
Return True if the cache manager is set to read-only, False otherwise.
- repo_cache_needs_filter(name: str) bool ¶
Return True if the cached repo needs to be filtered.
This happens when the cache is used, but the actual filter is more specific than the cached filter.
- to_readonly() CacheManager ¶
Return a read-only copy of the object.
The returned object will raise an exception if any writing method is called.
- class blueetl.cache.CoupledCache(cached: ConfigT | None, actual: ConfigT)¶
Bases:
Generic
[ConfigT
]Container of cached and actual configurations.
- class blueetl.cache.DummyLockManager(*args, **kwargs)¶
Bases:
LockManagerProtocol
Dummy Lock Manager.
- lock(mode: int) None ¶
Pretend to lock.
- property locked: bool¶
Always return True.
- unlock() None ¶
Pretend to unlock.
- class blueetl.cache.LockManager(path: PathLike)¶
Bases:
LockManagerProtocol
Lock Manager.
On Linux, the flock call is handled locally, and the underlying filesystem (GPFS) does not get any notification that locks are being set. Therefore, GPFS cannot enforce locks across nodes.
Initialize the object.
- Parameters:
path – path to an existing directory to be used for locking.
- lock(mode: int) None ¶
Lock the directory.
- Parameters:
mode – LockManager.LOCK_EX for exclusive lock, or LockManager.LOCK_SH for shared lock.
- property locked: bool¶
Return True if the lock manager is locking the cache, False otherwise.
- unlock() None ¶
Unlock the directory.