ruins.core.data_manager#

The DataManager is a wrapper around all data sources used by RUINSapp. It can be configures by any Config class and organizes or caches all data sources using a DataSource inherited class. This makes the read and filter interface available on all sources, no matter where they are stored. Using the Config to instantiate a data manager can in principle enabled different profiles, or even an interaction with the frontend, although not implemented nor desired at the current stage.

Example

from ruins import core

# create default config
conf = core.Config()

# create a data manager from this
dm = core.DataManager(**conf)

Of course, the data manager can also be used without the config, ie. to open it in debug mode:

# using conf with conf.debug=False and overwrite it
dm = core.DataManager(**conf, debug=True)

Module Contents#

Classes#

DataSource

Abstract base class for data sources. This provides the common interface

FileSource

Abstract base class for file sources. This provides the common interface

HDF5Source

HDF5 file sources. This class is used to load HDF5 files.

CSVSource

CSV file source. This class is used to load CSV files.

DATSource

DAT file source. This class is used to load .dat files

DataManager

Main class for accessing different data sources.

class ruins.core.data_manager.DataSource(**kwargs)#

Bases: abc.ABC

Abstract base class for data sources. This provides the common interface for data sources of different source types (like file, URL, database).

abstract read()#
abstract filter(**kwargs)#
class ruins.core.data_manager.FileSource(path: str, cache: bool = True, hot_load=False, **kwargs)#

Bases: DataSource, abc.ABC

Abstract base class for file sources. This provides the common interface for every data source that is based on a file.

abstract _load_source()#

Method to load the actual source on the disk

read()#
filter()#
class ruins.core.data_manager.HDF5Source(path: str, cache: bool = True, hot_load=False, **kwargs)#

Bases: FileSource

HDF5 file sources. This class is used to load HDF5 files.

_load_source() xarray.Dataset#

Method to load the actual source on the disk

read() xarray.Dataset#
class ruins.core.data_manager.CSVSource(**kwargs)#

Bases: FileSource

CSV file source. This class is used to load CSV files.

_load_source()#

Method to load the actual source on the disk

class ruins.core.data_manager.DATSource(**kwargs)#

Bases: FileSource

DAT file source. This class is used to load .dat files

_load_source()#

Method to load the actual source on the disk

class ruins.core.data_manager.DataManager(datapath: str = None, cache: bool = True, hot_load=False, debug: bool = False, **kwargs)#

Bases: collections.abc.Mapping

Main class for accessing different data sources.

The DataManager holds and manages all data sources. The default behavior is to scan the specified path for files of known file extension and cache them in memory.

Parameters:
  • datapath (str) – A location where the data is stored. The class will load all sources there and make them accessible through DataSource classes.

  • cache (bool) – Will be passed to the DataSource classes. It true, the source will only be read once and then stored in memory until the DataManager gets deconstructed.

  • include_mimes (dict) – A dictionary of file extensions and their corresponding DataSource. If something is not listed, the DataManager will ignore the file type. The include_mimes can be overwritten by passing filenames directly.

read(name_or_file: str)#
from_config(datapath: str = None, cache: bool = True, hot_load: bool = False, debug: bool = False, **kwargs) None#

Initialize the DataManager from a Config object.

property datapath str#
property datasources List[DataSource]#
_infer_from_folder() None#

Read all files from the datapath as specified on instantiation. Calls add_source() on each file.

add_source(path: str, not_exists: str = 'raise') None#

Add a file as data source to the DataManager. Only if the file has an allowed file extension, it will be managed. Files of same name will be overwritten, this is also true if they had different extensions.

resolve_class_name(cls_name: str) Type[DataSource]#
__len__()#

Return the number of managed data sources

__iter__()#

Iterate over all dataset names

__getitem__(key: str) DataSource#

Return the requested datasource

__repr__()#

Return repr(self).

__str__()#

Return str(self).