BaseDataset#

class osekit.core_api.base_dataset.BaseDataset(data: list[TData], name: str | None = None, suffix: str = '', folder: Path | None = None)#

Base class for Dataset objects.

Datasets are collections of Data, with methods that simplify repeated operations on the data.

Instantiate a Dataset object from the Data objects.

property base_name: str#

Name of the dataset without suffix.

property begin: Timestamp#

Begin of the first data object.

property data_duration: Timedelta#

Return the most frequent duration among durations of the data of this dataset, rounded to the nearest second.

property end: Timestamp#

End of the last data object.

property files: set[TFile]#

All files referred to by the Dataset.

property folder: Path#

Folder in which the dataset files are located or to be written.

classmethod from_dict(dictionary: dict) BaseDataset#

Deserialize a BaseDataset from a dictionary.

Parameters#

dictionary: dict

The serialized dictionary representing the BaseData.

Returns#

AudioData

The deserialized BaseDataset.

classmethod from_files(files: list[TFile], begin: Timestamp | None = None, end: Timestamp | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, name: str | None = None) BaseDataset#

Return a base BaseDataset object from a list of Files.

Parameters#

files: list[TFile]

The list of files contained in the Dataset.

begin: Timestamp | None

Begin of the first data object. Defaulted to the begin of the first file.

end: Timestamp | None

End of the last data object. Defaulted to the end of the last file.

mode: Literal[ā€œfilesā€, ā€œtimedelta_totalā€, ā€œtimedelta_fileā€]

Mode of creation of the dataset data from the original files. ā€œfilesā€: one data will be created for each file. ā€œtimedelta_totalā€: data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. ā€œtimedelta_fileā€: data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.

data_duration: Timedelta | None

Duration of the data objects. If mode is set to ā€œfilesā€, this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one data object will cover the whole time period.

name: str|None

Name of the dataset.

Returns#

BaseDataset[TItem, TFile]: The DataBase object.

classmethod from_folder(folder: Path, strptime_format: str, file_class: type[TFile] = <class 'osekit.core_api.base_file.BaseFile'>, supported_file_extensions: list[str] | None = None, begin: Timestamp | None = None, end: Timestamp | None = None, timezone: str | pytz.timezone | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, name: str | None = None) BaseDataset#

Return a BaseDataset from a folder containing the base files.

Parameters#

folder: Path

The folder containing the files.

strptime_format: str

The strptime format of the timestamps in the file names.

file_class: type[Tfile]

Derived type of BaseFile used to instantiate the dataset.

supported_file_extensions: list[str]

List of supported file extensions for parsing TFiles.

begin: Timestamp | None

The begin of the dataset. Defaulted to the begin of the first file.

end: Timestamp | None

The end of the dataset. Defaulted to the end of the last file.

timezone: str | pytz.timezone | None

The timezone in which the file should be localized. If None, the file begin/end will be tz-naive. If different from a timezone parsed from the filename, the timestamps’ timezone will be converted from the parsed timezone to the specified timezone.

mode: Literal[ā€œfilesā€, ā€œtimedelta_totalā€, ā€œtimedelta_fileā€]

Mode of creation of the dataset data from the original files. ā€œfilesā€: one data will be created for each file. ā€œtimedelta_totalā€: data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. ā€œtimedelta_fileā€: data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.

data_duration: Timedelta | None

Duration of the data objects. If mode is set to ā€œfilesā€, this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one object will cover the whole time period.

name: str|None

Name of the dataset.

Returns#

Basedataset:

The base dataset.

classmethod from_json(file: Path) BaseDataset#

Deserialize a BaseDataset from a JSON file.

Parameters#

file: Path

Path to the serialized JSON file representing the BaseDataset.

Returns#

BaseDataset

The deserialized BaseDataset.

property has_default_name: bool#

Return True if the dataset has a default name, False if it has a given name.

move_files(folder: Path) None#

Move the dataset files to the destination folder.

Parameters#

folder: Path

Destination folder in which the dataset files will be moved.

property name: str#

Name of the dataset with suffix.

property suffix: str#

Suffix that is applied to the name of the ads.

This is used by the public API, for suffixing multiple core_api datasets that are created simultaneously and share the same namewith their specific type,

e.g. _audio or _spectro.

to_dict() dict#

Serialize a BaseDataset to a dictionary.

Returns#

dict:

The serialized dictionary representing the BaseDataset.

write(folder: Path, link: bool = False, first: int = 0, last: int | None = None) None#

Write all data objects in the specified folder.

Parameters#

folder: Path

Folder in which to write the data.

link: bool

If True, the Data will be bound to the written file. Its items will be replaced with a single item, which will match the whole new File.

first: int

Index of the first data object to write.

last: int | None

Index after the last data object to write.

write_json(folder: Path) None#

Write a serialized BaseDataset to a JSON file.