BaseDataset#

class osekit.core_api.base_dataset.BaseDataset(data: list[TData], name: str | None = None, suffix: str = '', folder: Path | None = None)#

Base class for Dataset objects.

Datasets are collections of Data, with methods that simplify repeated operations on the data.

Instantiate a Dataset object from the Data objects.

property base_name: str#

Name of the dataset without suffix.

property begin: Timestamp#

Begin of the first data object.

property data_duration: Timedelta#

Return the most frequent duration among the data of this dataset.

The duration is rounded to the nearest second.

property end: Timestamp#

End of the last data object.

property files: set[TFile]#

All files referred to by the Dataset.

property folder: Path#

Folder in which the dataset files are located or to be written.

classmethod from_dict(dictionary: dict) Self#

Deserialize a BaseDataset from a dictionary.

Parameters#

dictionary: dict

The serialized dictionary representing the BaseDataset.

Returns#

AudioData

The deserialized BaseDataset.

classmethod from_files(files: list[TFile], begin: Timestamp | None = None, end: Timestamp | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, overlap: float = 0.0, name: str | None = None, **kwargs) Self#

Return a Dataset object from a list of Files.

Parameters#

files: list[TFile]

The list of files contained in the Dataset.

begin: Timestamp | None

Begin of the first data object. Defaulted to the begin of the first file.

end: Timestamp | None

End of the last data object. Defaulted to the end of the last file.

mode: Literal[ā€œfilesā€, ā€œtimedelta_totalā€, ā€œtimedelta_fileā€]

Mode of creation of the dataset data from the original files. "files": one data will be created for each file. "timedelta_total": data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. "timedelta_file": data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.

data_duration: Timedelta | None

Duration of the data objects. If mode is set to "files", this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one data object will cover the whole time period.

overlap: float

Overlap percentage between consecutive data.

name: str|None

Name of the dataset.

kwargs:

Keyword arguments to pass to the cls.data_from_files() method.

Returns#

Self:

The Dataset object.

classmethod from_folder(folder: Path, strptime_format: str | None, begin: Timestamp | None = None, end: Timestamp | None = None, timezone: str | pytz.timezone | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', overlap: float = 0.0, data_duration: Timedelta | None = None, first_file_begin: Timestamp | None = None, name: str | None = None, **kwargs) Self#

Return a Dataset from a folder containing the base files.

Parameters#

folder: Path

The folder containing the files.

strptime_format: str | None

The strptime format used in the filenames. It should use valid strftime codes (https://strftime.org/). If None, the first audio file of the folder will start at first_file_begin, and each following file will start at the end of the previous one.

begin: Timestamp | None

The begin of the dataset. Defaulted to the begin of the first file.

end: Timestamp | None

The end of the dataset. Defaulted to the end of the last file.

timezone: str | pytz.timezone | None

The timezone in which the file should be localized. If None, the file begin/end will be tz-naive. If different from a timezone parsed from the filename, the timestamps’ timezone will be converted from the parsed timezone to the specified timezone.

mode: Literal[ā€œfilesā€, ā€œtimedelta_totalā€, ā€œtimedelta_fileā€]

Mode of creation of the dataset data from the original files. "files": one data will be created for each file. "timedelta_total": data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. "timedelta_file": data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.

overlap: float

Overlap percentage between consecutive data.

data_duration: Timedelta | None

Duration of the data objects. If mode is set to "files", this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one object will cover the whole time period.

first_file_begin: Timestamp | None

Timestamp of the first audio file being processed. Will be ignored if striptime_format is specified.

name: str|None

Name of the dataset.

kwargs:

Keyword arguments to pass to the cls.from_files() method.

Returns#

Self:

The dataset.

classmethod from_json(file: Path) Self#

Deserialize a BaseDataset from a JSON file.

Parameters#

file: Path

Path to the serialized JSON file representing the BaseDataset.

Returns#

BaseDataset

The deserialized BaseDataset.

property has_default_name: bool#

Return True if the dataset has a default name, False if it has a given name.

move_files(folder: Path) None#

Move the dataset files to the destination folder.

Parameters#

folder: Path

Destination folder in which the dataset files will be moved.

property name: str#

Name of the dataset with suffix.

property suffix: str#

Suffix that is applied to the name of the ads.

This is used by the public API, for suffixing multiple core_api datasets that are created simultaneously and share the same namewith their specific type, e.g. _audio or _spectro.

to_dict() dict#

Serialize a BaseDataset to a dictionary.

Returns#

dict:

The serialized dictionary representing the BaseDataset.

write(folder: Path, first: int = 0, last: int | None = None, *, link: bool = False) None#

Write all data objects in the specified folder.

Parameters#

folder: Path

Folder in which to write the data.

link: bool

If True, the Data will be bound to the written file. Its items will be replaced with a single item, which will match the whole new File.

first: int

Index of the first data object to write.

last: int | None

Index after the last data object to write.

write_json(folder: Path) None#

Write a serialized BaseDataset to a JSON file.