BaseDataset#

class osekit.core_api.base_dataset.BaseDataset(data: list[TData], name: str | None = None, suffix: str = '', folder: Path | None = None)#

Base class for Dataset objects.

Datasets are collections of Data, with methods that simplify repeated operations on the data.

Instantiate a Dataset object from the Data objects.

property base_name: str#: Name of the dataset without suffix.

property begin: Timestamp#: Begin of the first data object.

property data_duration: Timedelta#: Return the most frequent duration among durations of the data of this dataset, rounded to the nearest second.

property end: Timestamp#: End of the last data object.

property files: set[TFile]#: All files referred to by the Dataset.

property folder: Path#: Folder in which the dataset files are located or to be written.

classmethod from_dict(dictionary: dict) → BaseDataset#

Deserialize a BaseDataset from a dictionary.

Parameters#

dictionary: dict: The serialized dictionary representing the BaseData.

Returns#

AudioData: The deserialized BaseDataset.

classmethod from_files(files: list[TFile], begin: Timestamp | None = None, end: Timestamp | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, overlap: float = 0.0, name: str | None = None) → BaseDataset#

Return a base BaseDataset object from a list of Files.

Parameters#

files: list[TFile]: The list of files contained in the Dataset.
begin: Timestamp | None: Begin of the first data object. Defaulted to the begin of the first file.
end: Timestamp | None: End of the last data object. Defaulted to the end of the last file.
mode: Literal[“files”, “timedelta_total”, “timedelta_file”]: Mode of creation of the dataset data from the original files. “files”: one data will be created for each file. “timedelta_total”: data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. “timedelta_file”: data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.
data_duration: Timedelta | None: Duration of the data objects. If mode is set to “files”, this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one data object will cover the whole time period.
overlap: float: Overlap percentage between consecutive data.
name: str|None: Name of the dataset.

Returns#

BaseDataset[TItem, TFile]: The DataBase object.

classmethod from_folder(folder: Path, strptime_format: str | None, file_class: type[TFile] = <class 'osekit.core_api.base_file.BaseFile'>, supported_file_extensions: list[str] | None = None, begin: Timestamp | None = None, end: Timestamp | None = None, timezone: str | pytz.timezone | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', overlap: float = 0.0, data_duration: Timedelta | None = None, first_file_begin: Timestamp | None = None, name: str | None = None) → BaseDataset#

Return a BaseDataset from a folder containing the base files.

Parameters#

folder: Path: The folder containing the files.
strptime_format: str | None: The strptime format used in the filenames. It should use valid strftime codes (https://strftime.org/). If None, the first audio file of the folder will start at first_file_begin, and each following file will start at the end of the previous one.
file_class: type[Tfile]: Derived type of BaseFile used to instantiate the dataset.
supported_file_extensions: list[str]: List of supported file extensions for parsing TFiles.
begin: Timestamp | None: The begin of the dataset. Defaulted to the begin of the first file.
end: Timestamp | None: The end of the dataset. Defaulted to the end of the last file.
timezone: str | pytz.timezone | None: The timezone in which the file should be localized. If None, the file begin/end will be tz-naive. If different from a timezone parsed from the filename, the timestamps’ timezone will be converted from the parsed timezone to the specified timezone.
mode: Literal[“files”, “timedelta_total”, “timedelta_file”]: Mode of creation of the dataset data from the original files. “files”: one data will be created for each file. “timedelta_total”: data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. “timedelta_file”: data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.
overlap: float: Overlap percentage between consecutive data.
data_duration: Timedelta | None: Duration of the data objects. If mode is set to “files”, this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one object will cover the whole time period.
first_file_begin: Timestamp | None: Timestamp of the first audio file being processed. Will be ignored if striptime_format is specified.
name: str|None: Name of the dataset.

Returns#

Basedataset:: The base dataset.

classmethod from_json(file: Path) → BaseDataset#

Deserialize a BaseDataset from a JSON file.

Parameters#

file: Path: Path to the serialized JSON file representing the BaseDataset.

Returns#

BaseDataset: The deserialized BaseDataset.

property has_default_name: bool#: Return True if the dataset has a default name, False if it has a given name.

move_files(folder: Path) → None#

Move the dataset files to the destination folder.

Parameters#

folder: Path: Destination folder in which the dataset files will be moved.

property name: str#: Name of the dataset with suffix.

property suffix: str#

Suffix that is applied to the name of the ads.

This is used by the public API, for suffixing multiple core_api datasets that are created simultaneously and share the same namewith their specific type,

e.g. _audio or _spectro.

to_dict() → dict#

Serialize a BaseDataset to a dictionary.

Returns#

dict:: The serialized dictionary representing the BaseDataset.

write(folder: Path, link: bool = False, first: int = 0, last: int | None = None) → None#

Write all data objects in the specified folder.

Parameters#

folder: Path: Folder in which to write the data.
link: bool: If True, the Data will be bound to the written file. Its items will be replaced with a single item, which will match the whole new File.
first: int: Index of the first data object to write.
last: int | None: Index after the last data object to write.

write_json(folder: Path) → None#: Write a serialized BaseDataset to a JSON file.

BaseDataset

Contents

BaseDataset#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#