ruins.core.data_manager
#
The DataManager is a wrapper around all data sources used by RUINSapp.
It can be configures by any Config
class
and organizes or caches all data sources using a
DataSource
inherited class.
This makes the read and filter interface available on all sources, no matter
where they are stored.
Using the Config
to instantiate a data
manager can in principle enabled different profiles, or even an interaction
with the frontend, although not implemented nor desired at the current stage.
Example
from ruins import core
# create default config
conf = core.Config()
# create a data manager from this
dm = core.DataManager(**conf)
Of course, the data manager can also be used without the config, ie. to open it in debug mode:
# using conf with conf.debug=False and overwrite it
dm = core.DataManager(**conf, debug=True)
Module Contents#
Classes#
Abstract base class for data sources. This provides the common interface |
|
Abstract base class for file sources. This provides the common interface |
|
HDF5 file sources. This class is used to load HDF5 files. |
|
CSV file source. This class is used to load CSV files. |
|
DAT file source. This class is used to load .dat files |
|
Main class for accessing different data sources. |
- class ruins.core.data_manager.DataSource(**kwargs)#
Bases:
abc.ABC
Abstract base class for data sources. This provides the common interface for data sources of different source types (like file, URL, database).
- abstract read()#
- abstract filter(**kwargs)#
- class ruins.core.data_manager.FileSource(path: str, cache: bool = True, hot_load=False, **kwargs)#
Bases:
DataSource
,abc.ABC
Abstract base class for file sources. This provides the common interface for every data source that is based on a file.
- abstract _load_source()#
Method to load the actual source on the disk
- read()#
- filter()#
- class ruins.core.data_manager.HDF5Source(path: str, cache: bool = True, hot_load=False, **kwargs)#
Bases:
FileSource
HDF5 file sources. This class is used to load HDF5 files.
- _load_source() xarray.Dataset #
Method to load the actual source on the disk
- read() xarray.Dataset #
- class ruins.core.data_manager.CSVSource(**kwargs)#
Bases:
FileSource
CSV file source. This class is used to load CSV files.
- _load_source()#
Method to load the actual source on the disk
- class ruins.core.data_manager.DATSource(**kwargs)#
Bases:
FileSource
DAT file source. This class is used to load .dat files
- _load_source()#
Method to load the actual source on the disk
- class ruins.core.data_manager.DataManager(datapath: str = None, cache: bool = True, hot_load=False, debug: bool = False, **kwargs)#
Bases:
collections.abc.Mapping
Main class for accessing different data sources.
The DataManager holds and manages all data sources. The default behavior is to scan the specified path for files of known file extension and cache them in memory.
- Parameters:
datapath (str) – A location where the data is stored. The class will load all sources there and make them accessible through DataSource classes.
cache (bool) – Will be passed to the DataSource classes. It true, the source will only be read once and then stored in memory until the DataManager gets deconstructed.
include_mimes (dict) – A dictionary of file extensions and their corresponding DataSource. If something is not listed, the DataManager will ignore the file type. The include_mimes can be overwritten by passing filenames directly.
- from_config(datapath: str = None, cache: bool = True, hot_load: bool = False, debug: bool = False, **kwargs) None #
Initialize the DataManager from a
Config
object.
- property datasources List[DataSource] #
- _infer_from_folder() None #
Read all files from the datapath as specified on instantiation. Calls
add_source()
on each file.
- add_source(path: str, not_exists: str = 'raise') None #
Add a file as data source to the DataManager. Only if the file has an allowed file extension, it will be managed. Files of same name will be overwritten, this is also true if they had different extensions.
- resolve_class_name(cls_name: str) Type[DataSource] #
- __len__()#
Return the number of managed data sources
- __iter__()#
Iterate over all dataset names
- __getitem__(key: str) DataSource #
Return the requested datasource
- __repr__()#
Return repr(self).
- __str__()#
Return str(self).