blueetl.extract.base

Base extractor.

Classes

BaseExtractor(df, cached, filtered)

Base extractor class.

class blueetl.extract.base.BaseExtractor(df: DataFrame, cached: bool, filtered: bool)

Bases: ABC

Base extractor class.

Initialize the extractor.

Parameters:
  • df – Pandas DataFrame containing the extracted data.

  • cached – True if the data have been extracted from the cache, False otherwise.

  • filtered – True if the data have been filtered using a custom query, False otherwise.

property df: DataFrame

Return the internally wrapped dataframe.

classmethod from_pandas(df: DataFrame, query: dict | None = None, cached: bool = True) ExtractorT

Return a new object from the given dataframe.

If a query is specified, it’s passed to etl.q and applied as a filter.

It can be overridden together with to_pandas if some columns are not serializable.

Parameters:
  • df – dataframe to load.

  • query – optional filter dictionary, passed to etl.q.

  • cached – True if the data is loaded from the cache, False otherwise.

Returns:

a new extractor instance.

to_pandas() DataFrame

Return a dataframe that can be serialized and stored to disk.

It should be possible to call from_pandas with the returned dataframe to create an equivalent object.

It can be overridden together with from_pandas if some columns are not serializable.

Returns:

serializable dataframe.