AudioDataset#

class osekit.core_api.audio_dataset.AudioDataset(data: list[AudioData], name: str | None = None, suffix: str = '', folder: Path | None = None, instrument: Instrument | None = None)#

AudioDataset is a collection of AudioData objects.

AudioDataset is a collection of AudioData, with methods that simplify repeated operations on the audio data.

Initialize an AudioDataset.

classmethod from_base_dataset(base_dataset: BaseDataset, sample_rate: float | None = None, name: str | None = None, instrument: Instrument | None = None) AudioDataset#

Return an AudioDataset object from a BaseDataset object.

classmethod from_dict(dictionary: dict) AudioDataset#

Deserialize an AudioDataset from a dictionary.

Parameters#

dictionary: dict

The serialized dictionary representing the AudioDataset.

Returns#

AudioDataset

The deserialized AudioDataset.

classmethod from_files(files: list[AudioFile], begin: Timestamp | None = None, end: Timestamp | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, name: str | None = None, instrument: Instrument | None = None) AudioDataset#

Return an AudioDataset object from a list of AudioFiles.

Parameters#

files: list[AudioFile]

The list of files contained in the Dataset.

begin: Timestamp | None

Begin of the first data object. Defaulted to the begin of the first file.

end: Timestamp | None

End of the last data object. Defaulted to the end of the last file.

mode: Literal[ā€œfilesā€, ā€œtimedelta_totalā€, ā€œtimedelta_fileā€]

Mode of creation of the dataset data from the original files. ā€œfilesā€: one data will be created for each file. ā€œtimedelta_totalā€: data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. ā€œtimedelta_fileā€: data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.

data_duration: Timedelta | None

Duration of the data objects. If mode is set to ā€œfilesā€, this parameter has no effect. If provided, data will be evenly distributed between begin and end. Else, one data object will cover the whole time period.

name: str|None

Name of the dataset.

instrument: Instrument | None

Instrument that might be used to obtain acoustic pressure from the wav audio data.

Returns#

BaseDataset[TItem, TFile]: The DataBase object.

classmethod from_folder(folder: Path, strptime_format: str, begin: Timestamp | None = None, end: Timestamp | None = None, timezone: str | pytz.timezone | None = None, mode: Literal['files', 'timedelta_total', 'timedelta_file'] = 'timedelta_total', data_duration: Timedelta | None = None, name: str | None = None, instrument: Instrument | None = None, **kwargs: any) AudioDataset#

Return an AudioDataset from a folder containing the audio files.

Parameters#

folder: Path

The folder containing the audio files.

strptime_format: str

The strptime format of the timestamps in the audio file names.

begin: Timestamp | None

The begin of the audio dataset. Defaulted to the begin of the first file.

end: Timestamp | None

The end of the audio dataset. Defaulted to the end of the last file.

timezone: str | pytz.timezone | None

The timezone in which the file should be localized. If None, the file begin/end will be tz-naive. If different from a timezone parsed from the filename, the timestamps’ timezone will be converted from the parsed timezone to the specified timezone.

mode: Literal[ā€œfilesā€, ā€œtimedelta_totalā€, ā€œtimedelta_fileā€]

Mode of creation of the dataset data from the original files. ā€œfilesā€: one data will be created for each file. ā€œtimedelta_totalā€: data objects of duration equal to data_duration will be created from the begin timestamp to the end timestamp. ā€œtimedelta_fileā€: data objects of duration equal to data_duration will be created from the beginning of the first file that the begin timestamp is into, until it would resume in a data beginning between two files. Then, the next data object will be created from the beginning of the next original file and so on.

data_duration: Timedelta | None

Duration of the audio data objects. If mode is set to ā€œfilesā€, this parameter has no effect. If provided, audio data will be evenly distributed between begin and end. Else, one data object will cover the whole time period.

name: str|None

Name of the dataset.

instrument: Instrument | None

Instrument that might be used to obtain acoustic pressure from the wav audio data.

kwargs: any

Keyword arguments passed to the BaseDataset.from_folder classmethod.

Returns#

Audiodataset:

The audio dataset.

classmethod from_json(file: Path) AudioDataset#

Deserialize an AudioDataset from a JSON file.

Parameters#

file: Path

Path to the serialized JSON file representing the AudioDataset.

Returns#

AudioDataset

The deserialized AudioDataset.

property instrument: Instrument | None#

Instrument that can be used to get acoustic pressure from wav audio data.

property sample_rate: set[float] | float#

Return the most frequent sample rate among those of this dataset data.

write(folder: Path, subtype: str | None = None, link: bool = False, first: int = 0, last: int | None = None) None#

Write all data objects in the specified folder.

Parameters#

folder: Path

Folder in which to write the data.

subtype: str | None

Subtype as provided by the soundfile module. Defaulted as the default 16-bit PCM for WAV audio files.

link: bool

If True, each AudioData will be bound to the corresponding written file. Their items will be replaced with a single item, which will match the whole new AudioFile.

first: int

Index of the first AudioData object to write.

last: int | None

Index after the last AudioData object to write.