BaseDataset#
- class osekit.core_api.base_dataset.BaseDataset(data: list[TData], name: str | None = None, suffix: str = '', folder: Path | None = None)#
Base class for Dataset objects.
Datasets are collections of Data, with methods that simplify repeated operations on the data.
Instantiate a Dataset object from the Data objects.
- property base_name: str#
Name of the dataset without suffix.
- property begin: Timestamp#
Begin of the first data object.
- property data_duration: Timedelta#
Return the most frequent duration among the data of this dataset.
The duration is rounded to the nearest second.
- property end: Timestamp#
End of the last data object.
- property files: set[TFile]#
All files referred to by the Dataset.
- property folder: Path#
Folder in which the dataset files are located or to be written.
- classmethod from_dict(dictionary: dict) Self#
Deserialize a
BaseDatasetfrom a dictionary.Parameters#
- dictionary: dict
The serialized dictionary representing the
BaseDataset.
Returns#
- AudioData
The deserialized
BaseDataset.
- classmethod from_files(files: list[TFile], begin: Timestamp | None = None, end: Timestamp | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, overlap: float = 0.0, name: str | None = None, **kwargs) Self#
Return a Dataset object from a list of Files.
Parameters#
- files: list[TFile]
The list of files contained in the Dataset.
- begin: Timestamp | None
Begin of the first data object. Defaulted to the begin of the first file.
- end: Timestamp | None
End of the last data object. Defaulted to the end of the last file.
- mode: Literal[āfilesā, ātimedelta_totalā, ātimedelta_fileā]
Mode of creation of the dataset data from the original files.
"files": one data will be created for each file."timedelta_total": data objects of duration equal todata_durationwill be created from thebegintimestamp to theendtimestamp."timedelta_file": data objects of duration equal todata_durationwill be created from the beginning of the first file that thebegintimestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.- data_duration: Timedelta | None
Duration of the data objects. If mode is set to
"files", this parameter has no effect. If provided, data will be evenly distributed betweenbeginandend. Else, one data object will cover the whole time period.- overlap: float
Overlap percentage between consecutive data.
- name: str|None
Name of the dataset.
- kwargs:
Keyword arguments to pass to the
cls.data_from_files()method.
Returns#
- Self:
The Dataset object.
- classmethod from_folder(folder: Path, strptime_format: str | None, begin: Timestamp | None = None, end: Timestamp | None = None, timezone: str | pytz.timezone | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', overlap: float = 0.0, data_duration: Timedelta | None = None, first_file_begin: Timestamp | None = None, name: str | None = None, **kwargs) Self#
Return a Dataset from a folder containing the base files.
Parameters#
- folder: Path
The folder containing the files.
- strptime_format: str | None
The strptime format used in the filenames. It should use valid strftime codes (https://strftime.org/). If None, the first audio file of the folder will start at
first_file_begin, and each following file will start at the end of the previous one.- begin: Timestamp | None
The begin of the dataset. Defaulted to the begin of the first file.
- end: Timestamp | None
The end of the dataset. Defaulted to the end of the last file.
- timezone: str | pytz.timezone | None
The timezone in which the file should be localized. If None, the file begin/end will be tz-naive. If different from a timezone parsed from the filename, the timestampsā timezone will be converted from the parsed timezone to the specified timezone.
- mode: Literal[āfilesā, ātimedelta_totalā, ātimedelta_fileā]
Mode of creation of the dataset data from the original files.
"files": one data will be created for each file."timedelta_total": data objects of duration equal todata_durationwill be created from thebegintimestamp to theendtimestamp."timedelta_file": data objects of duration equal todata_durationwill be created from the beginning of the first file that thebegintimestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.- overlap: float
Overlap percentage between consecutive data.
- data_duration: Timedelta | None
Duration of the data objects. If mode is set to
"files", this parameter has no effect. If provided, data will be evenly distributed betweenbeginandend. Else, one object will cover the whole time period.- first_file_begin: Timestamp | None
Timestamp of the first audio file being processed. Will be ignored if
striptime_formatis specified.- name: str|None
Name of the dataset.
- kwargs:
Keyword arguments to pass to the
cls.from_files()method.
Returns#
- Self:
The dataset.
- classmethod from_json(file: Path) Self#
Deserialize a
BaseDatasetfrom a JSON file.Parameters#
- file: Path
Path to the serialized JSON file representing the
BaseDataset.
Returns#
- BaseDataset
The deserialized
BaseDataset.
- property has_default_name: bool#
Return
Trueif the dataset has a default name,Falseif it has a given name.
- move_files(folder: Path) None#
Move the dataset files to the destination folder.
Parameters#
- folder: Path
Destination folder in which the dataset files will be moved.
- property name: str#
Name of the dataset with suffix.
- property suffix: str#
Suffix that is applied to the name of the ads.
This is used by the public API, for suffixing multiple core_api datasets that are created simultaneously and share the same namewith their specific type, e.g.
_audioor_spectro.
- to_dict() dict#
Serialize a
BaseDatasetto a dictionary.Returns#
- dict:
The serialized dictionary representing the
BaseDataset.
- write(folder: Path, first: int = 0, last: int | None = None, *, link: bool = False) None#
Write all data objects in the specified folder.
Parameters#
- folder: Path
Folder in which to write the data.
- link: bool
If
True, the Data will be bound to the written file. Its items will be replaced with a single item, which will match the whole new File.- first: int
Index of the first data object to write.
- last: int | None
Index after the last data object to write.
- write_json(folder: Path) None#
Write a serialized
BaseDatasetto a JSON file.