BaseDataset#
- class osekit.core_api.base_dataset.BaseDataset(data: list[TData], name: str | None = None, suffix: str = '', folder: Path | None = None)#
Base class for Dataset objects.
Datasets are collections of Data, with methods that simplify repeated operations on the data.
Instantiate a Dataset object from the Data objects.
- property base_name: str#
Name of the dataset without suffix.
- property begin: Timestamp#
Begin of the first data object.
- property data_duration: Timedelta#
Return the most frequent duration among durations of the data of this dataset, rounded to the nearest second.
- property end: Timestamp#
End of the last data object.
- property files: set[TFile]#
All files referred to by the Dataset.
- property folder: Path#
Folder in which the dataset files are located or to be written.
- classmethod from_dict(dictionary: dict) BaseDataset #
Deserialize a BaseDataset from a dictionary.
Parameters#
- dictionary: dict
The serialized dictionary representing the BaseData.
Returns#
- AudioData
The deserialized BaseDataset.
- classmethod from_files(files: list[TFile], begin: Timestamp | None = None, end: Timestamp | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, name: str | None = None) BaseDataset #
Return a base BaseDataset object from a list of Files.
Parameters#
- files: list[TFile]
The list of files contained in the Dataset.
- begin: Timestamp | None
Begin of the first data object. Defaulted to the begin of the first file.
- end: Timestamp | None
End of the last data object. Defaulted to the end of the last file.
- mode: Literal[āfilesā, ātimedelta_totalā, ātimedelta_fileā]
Mode of creation of the dataset data from the original files. āfilesā: one data will be created for each file. ātimedelta_totalā: data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. ātimedelta_fileā: data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.
- data_duration: Timedelta | None
Duration of the data objects. If mode is set to āfilesā, this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one data object will cover the whole time period.
- name: str|None
Name of the dataset.
Returns#
BaseDataset[TItem, TFile]: The DataBase object.
- classmethod from_folder(folder: Path, strptime_format: str, file_class: type[TFile] = <class 'osekit.core_api.base_file.BaseFile'>, supported_file_extensions: list[str] | None = None, begin: Timestamp | None = None, end: Timestamp | None = None, timezone: str | pytz.timezone | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, name: str | None = None) BaseDataset #
Return a BaseDataset from a folder containing the base files.
Parameters#
- folder: Path
The folder containing the files.
- strptime_format: str
The strptime format of the timestamps in the file names.
- file_class: type[Tfile]
Derived type of BaseFile used to instantiate the dataset.
- supported_file_extensions: list[str]
List of supported file extensions for parsing TFiles.
- begin: Timestamp | None
The begin of the dataset. Defaulted to the begin of the first file.
- end: Timestamp | None
The end of the dataset. Defaulted to the end of the last file.
- timezone: str | pytz.timezone | None
The timezone in which the file should be localized. If None, the file begin/end will be tz-naive. If different from a timezone parsed from the filename, the timestampsā timezone will be converted from the parsed timezone to the specified timezone.
- mode: Literal[āfilesā, ātimedelta_totalā, ātimedelta_fileā]
Mode of creation of the dataset data from the original files. āfilesā: one data will be created for each file. ātimedelta_totalā: data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. ātimedelta_fileā: data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.
- data_duration: Timedelta | None
Duration of the data objects. If mode is set to āfilesā, this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one object will cover the whole time period.
- name: str|None
Name of the dataset.
Returns#
- Basedataset:
The base dataset.
- classmethod from_json(file: Path) BaseDataset #
Deserialize a BaseDataset from a JSON file.
Parameters#
- file: Path
Path to the serialized JSON file representing the BaseDataset.
Returns#
- BaseDataset
The deserialized BaseDataset.
- property has_default_name: bool#
Return True if the dataset has a default name, False if it has a given name.
- move_files(folder: Path) None #
Move the dataset files to the destination folder.
Parameters#
- folder: Path
Destination folder in which the dataset files will be moved.
- property name: str#
Name of the dataset with suffix.
- property suffix: str#
Suffix that is applied to the name of the ads.
This is used by the public API, for suffixing multiple core_api datasets that are created simultaneously and share the same namewith their specific type,
e.g. _audio or _spectro.
- to_dict() dict #
Serialize a BaseDataset to a dictionary.
Returns#
- dict:
The serialized dictionary representing the BaseDataset.
- write(folder: Path, link: bool = False, first: int = 0, last: int | None = None) None #
Write all data objects in the specified folder.
Parameters#
- folder: Path
Folder in which to write the data.
- link: bool
If True, the Data will be bound to the written file. Its items will be replaced with a single item, which will match the whole new File.
- first: int
Index of the first data object to write.
- last: int | None
Index after the last data object to write.
- write_json(folder: Path) None #
Write a serialized BaseDataset to a JSON file.