Dataset#

Main class of the Public API.

The Dataset correspond to a collection of audio, spectro and auxilary core_api datasets. It has additionnal metadata that can be exported, e.g. to APLOSE.

Initialize a Dataset.

Parameters#

folder: Path: Path to the folder containing the original audio files.
strptime_format: str: The strptime format used in the filenames. It should use valid strftime codes (https://strftime.org/).
gps_coordinates: str | list | tuple: GPS coordinates of the location were audio files were recorded.
depth: float: Depth at which the audio files were recorded.
timezone: str | None: Timezone in which the audio data will be located. If the audio file timestamps are parsed with a tz-aware strptime_format (%z or %Z code), the AudioFiles will be converted from the parsed timezone to the specified timezone.
datasets: dict | None: Core API datasets that already belong to this dataset. Mainly used for deserialization.
job_builder: Job_builder | None: If None, analyses from this Dataset will be run locally. Otherwise, PBS job files will be created and submitted when analyses are run. See the osekit.job module for more info.
instrument: Instrument | None: Instrument that might be used to obtain acoustic pressure from the wav audio data. See the osekit.core_api.instrument module for more info.

property analyses: list[str]#: Return the list of the names of the analyses ran with this Dataset.

build() → None#

Build the Dataset.

Building a dataset moves the original audio files to a specific folder and creates metadata csv used by APLOSE.

delete_analysis(analysis_name: str) → None#

Delete all output datasets from an analysis.

WARNING: all the analysis output files will be deleted.

export_analysis(analysis_type: AnalysisType, ads: AudioDataset | None = None, sds: SpectroDataset | LTASDataset | None = None, link: bool = False, subtype: str | None = None, matrix_folder_name: str = 'matrix', spectrogram_folder_name: str = 'spectrogram', welch_folder_name: str = 'welch') → None#

Perform an analysis and write the results on disk.

An analysis is defined as a manipulation of the original audio files: reshaping the audio, exporting spectrograms or npz matrices (or a mix of those three) are examples of analyses. The tasks will be distributed to jobs if self.job_builder is not None, else it will be distributed on self.job_builder.nb_jobs jobs.

Parameters#

spectrogram_folder_name:: The name of the folder in which the png spectrograms will be exported (relative to sds.folder)
matrix_folder_name:: The name of the folder in which the npz matrices will be exported (relative to sds.folder)
welch_folder_name:: The name of the folder in which the npz welch files will be exported (relative to sds.folder)
sds: SpectroDataset | LTASDataset: The SpectroDataset on which the data should be written.
analysis_typeAnalysisType: Type of the analysis to be performed. AudioDataset and SpectroDataset instances will be created depending on the flags. See osekit.public_api.analysis.AnalysisType docstring for more information.
ads: AudioDataset: The AudioDataset on which the data should be written.
link: bool: If set to True, the ads data will be linked to the exported files.
subtype: str | None: The subtype of the audio files as provided by the soundfile module.

classmethod from_dict(dictionary: dict) → Dataset#

Deserialize a dataset from a dictionary.

Parameters#

dictionary: dict: The serialized dictionary representing the dataset.

Returns#

Dataset: The deserialized dataset.

classmethod from_json(file: Path) → Dataset#

Deserialize a Dataset from a JSON file.

Parameters#

file: Path: Path to the serialized JSON file representing the Dataset.

Returns#

Dataset: The deserialized BaseDataset.

get_analysis_audiodataset(analysis: Analysis) → AudioDataset#

Return an AudioDataset created from the analysis parameters.

Parameters#

analysis: Analysis: Analysis for which to generate an AudioDataset object.

Returns#

AudioDataset:: The AudioDataset that match the analysis parameters. This AudioDataset can be used either to have a peek at the analysis output, or to edit the analysis (adding/removing data) by editing it and passing it as a parameter to the Dataset.run_analysis() method.

get_analysis_spectrodataset(analysis: Analysis, audio_dataset: AudioDataset | None = None) → SpectroDataset | LTASDataset#

Return a SpectroDataset (or LTASDataset) created from analysis parameters.

Parameters#

analysis: Analysis

Analysis for which to generate an AudioDataset object.

audio_dataset: AudioDataset|None

If provided, the SpectroDataset will be initialized from this AudioDataset.: This can be used to edit the analysis (e.g. adding/removing data)

before running it.

Returns#

SpectroDataset | LTASDataset:: The SpectroDataset that match the analysis parameters. This SpectroDataset can be used, for example, to have a peek at the analysis output before running it. If Analysis.is_ltas is True, a LTASDataset is returned.

get_dataset(dataset_name: str) → type[DatasetChild] | None#

Get an analysis dataset from its name.

Parameters#

dataset_name: str: Name of the analysis dataset.

Returns#

type[DatasetChild]:: Analysis dataset from the dataset.datasets property.

get_datasets_by_analysis(analysis_name: str) → list[type[DatasetChild]]#

Get all output datasets from a given analysis.

Parameters#

analysis_name: str: Name of the analysis of which to get the output datasets.

Returns#

list[type[DatasetChild]] List of the analysis output datasets.

property origin_dataset: AudioDataset#: Return the AudioDataset from which this Dataset has been built.

property origin_files: list[AudioFile] | None#: Return the original audio files from which this Dataset has been built.

rename_analysis(analysis_name: str, new_analysis_name: str) → None#

Rename an already ran analysis.

Parameters#

analysis_name: str: Name of the analysis to rename.
new_analysis_name: str: New name of the analysis to rename.

reset() → None#

Reset the Dataset.

Resetting a dataset will move back the original audio files and the content of the “other” folder to the root folder. WARNING: all other files and folders will be deleted.

run_analysis(analysis: Analysis, audio_dataset: AudioDataset | None = None) → None#

Create a new analysis dataset from the original audio files.

The analysis parameter sets which type(s) of core_api dataset(s) will be created and added to the Dataset.datasets property, plus which output files will be written to disk (reshaped audio files, npz spectra matrices, png spectrograms…).

Parameters#

analysis: Analysis: Analysis to run. Contains the analysis type and required info. See the public_api.Analysis.Analysis docstring for more info.
audio_dataset: AudioDataset: If provided, the analysis will be run on this AudioDataset. Else, an AudioDataset will be created from the analysis parameters. This can be used to edit the analysis AudioDataset (adding/removing AudioData etc.)

to_dict() → dict#

Serialize a dataset to a dictionary.

Returns#

dict:: The serialized dictionary representing the dataset.

write_json(folder: Path | None = None) → None#: Write a serialized Dataset to a JSON file.

Dataset

Contents

Dataset#

Parameters#

Parameters#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Parameters#

Returns#