AudioDataset#
- class osekit.core_api.audio_dataset.AudioDataset(data: list[AudioData], name: str | None = None, suffix: str = '', folder: Path | None = None, instrument: Instrument | None = None)#
AudioDatasetis a collection ofAudioDataobjects.AudioDatasetis a collection ofAudioData, with methods that simplify repeated operations on the audio data.Initialize an
AudioDataset.- classmethod from_files(files: list[AudioFile], begin: Timestamp | None = None, end: Timestamp | None = None, name: str | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', overlap: float = 0.0, data_duration: Timedelta | None = None, sample_rate: float | None = None, instrument: Instrument | None = None, normalization: Normalization = <Normalization.RAW: 1>) AudioDataset#
Return an AudioDataset object from a list of AudioFiles.
Parameters#
- files: list[AudioFile]
The list of files contained in the Dataset.
- begin: Timestamp | None
The begin of the audio dataset. Defaulted to the begin of the first file.
- end: Timestamp | None
The end of the audio dataset. Defaulted to the end of the last file.
- mode: Literal[āfilesā, ātimedelta_totalā, ātimedelta_fileā]
Mode of creation of the dataset data from the original files.
"files": one data will be created for each file."timedelta_total": data objects of duration equal todata_durationwill be created from the begin timestamp to the end timestamp."timedelta_file": data objects of duration equal todata_durationwill be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.- overlap: float
Overlap percentage between consecutive data.
- data_duration: Timedelta | None
Duration of the audio data objects. If mode is set to
"files", this parameter has no effect. If provided, audio data will be evenly distributed between begin and end. Else, one data object will cover the whole time period.- sample_rate: float | None
Sample rate of the audio data objects.
- name: str|None
Name of the dataset.
- instrument: Instrument | None
Instrument that might be used to obtain acoustic pressure from the wav audio data.
- normalization: Normalization
The type of normalization to apply to the audio data.
Returns#
AudioDataset: The
AudioDatasetobject.
- classmethod from_folder(folder: Path, strptime_format: str | None, begin: Timestamp | None = None, end: Timestamp | None = None, timezone: str | pytz.timezone | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file']='timedelta_total', overlap: float = 0.0, data_duration: Timedelta | None = None, sample_rate: float | None = None, name: str | None = None, instrument: Instrument | None = None, normalization: Normalization = <Normalization.RAW: 1>, **kwargs) Self#
Return an
AudioDatasetfrom a folder containing the audio files.Parameters#
- folder: Path
The folder containing the audio files.
- strptime_format: str | None
The strptime format used in the filenames. It should use valid strftime codes (https://strftime.org/). If
None, the first audio file of the folder will start atfirst_file_begin, and each following file will start at the end of the previous one.- begin: Timestamp | None
The begin of the audio dataset. Defaulted to the begin of the first file.
- end: Timestamp | None
The end of the audio dataset. Defaulted to the end of the last file.
- timezone: str | pytz.timezone | None
The timezone in which the file should be localized. If
None, the file begin/end will be tz-naive. If different from a timezone parsed from the filename, the timestampsā timezone will be converted from the parsed timezone to the specified timezone.- mode: Literal[āfilesā, ātimedelta_totalā, ātimedelta_fileā]
Mode of creation of the dataset data from the original files.
"files": one data will be created for each file."timedelta_total": data objects of duration equal todata_durationwill be created from the begin timestamp to the end timestamp."timedelta_file": data objects of duration equal todata_durationwill be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.- overlap: float
Overlap percentage between consecutive data.
- data_duration: Timedelta | None
Duration of the audio data objects. If mode is set to
"files", this parameter has no effect. If provided, audio data will be evenly distributed between begin and end. Else, one data object will cover the whole time period.- sample_rate: float | None
Sample rate of the audio data objects.
- name: str|None
Name of the dataset.
- instrument: Instrument | None
Instrument that might be used to obtain acoustic pressure from the wav audio data.
- normalization: Normalization
The type of normalization to apply to the audio data.
- kwargs: any
Keyword arguments passed to the
BaseDataset.from_folder()classmethod.
Returns#
- Audiodataset:
The audio dataset.
- classmethod from_json(file: Path) Self#
Deserialize an
AudioDatasetfrom a JSON file.Parameters#
- file: Path
Path to the serialized JSON file representing the
AudioDataset.
Returns#
- AudioDataset
The deserialized
AudioDataset.
- property instrument: Instrument | None#
Instrument that can be used to get acoustic pressure from wav audio data.
- property normalization: Normalization#
Return the most frequent normalization among those of this dataset data.
- property sample_rate: set[float] | float#
Return the most frequent sample rate among those of this dataset data.
- write(folder: Path, first: int = 0, last: int | None = None, *, subtype: str | None = None, link: bool = False) None#
Write all data objects in the specified folder.
Parameters#
- folder: Path
Folder in which to write the data.
- first: int
Index of the first
AudioDataobject to write.- last: int | None
Index after the last
AudioDataobject to write.- subtype: str | None
Subtype as provided by the soundfile module. Defaulted as the default 16-bit PCM for WAV audio files.
- link: bool
If
True, eachAudioDatawill be bound to the corresponding written file. Their items will be replaced with a single item, which will match the whole newAudioFile.