Dataset#
- class osekit.public_api.dataset.Dataset(folder: Path, strptime_format: str, gps_coordinates: str | list | tuple = (0, 0), depth: float = 0.0, timezone: str | None = None, datasets: dict | None = None, job_builder: Job_builder | None = None, instrument: Instrument | None = None)#
Main class of the Public API.
The Dataset correspond to a collection of audio, spectro and auxilary core_api datasets. It has additionnal metadata that can be exported, e.g. to APLOSE.
Initialize a Dataset.
Parameters#
- folder: Path
Path to the folder containing the original audio files.
- strptime_format: str
The strptime format used in the filenames. It should use valid strftime codes (https://strftime.org/).
- gps_coordinates: str | list | tuple
GPS coordinates of the location were audio files were recorded.
- depth: float
Depth at which the audio files were recorded.
- timezone: str | None
Timezone in which the audio data will be located. If the audio file timestamps are parsed with a tz-aware strptime_format (%z or %Z code), the AudioFiles will be converted from the parsed timezone to the specified timezone.
- datasets: dict | None
Core API datasets that already belong to this dataset. Mainly used for deserialization.
- job_builder: Job_builder | None
If None, analyses from this Dataset will be run locally. Otherwise, PBS job files will be created and submitted when analyses are run. See the osekit.job module for more info.
- instrument: Instrument | None
Instrument that might be used to obtain acoustic pressure from the wav audio data. See the osekit.core_api.instrument module for more info.
- build() None #
Build the Dataset.
Building a dataset moves the original audio files to a specific folder and creates metadata csv used by APLOSE.
- export_analysis(analysis_type: AnalysisType, ads: AudioDataset | None = None, sds: SpectroDataset | LTASDataset | None = None, link: bool = False, subtype: str | None = None, matrix_folder_name: str = 'matrix', spectrogram_folder_name: str = 'spectrogram', welch_folder_name: str = 'welch') None #
Perform an analysis and write the results on disk.
An analysis is defined as a manipulation of the original audio files: reshaping the audio, exporting spectrograms or npz matrices (or a mix of those three) are examples of analyses. The tasks will be distributed to jobs if self.job_builder is not None, else it will be distributed on self.job_builder.nb_jobs jobs.
Parameters#
- spectrogram_folder_name:
The name of the folder in which the png spectrograms will be exported (relative to sds.folder)
- matrix_folder_name:
The name of the folder in which the npz matrices will be exported (relative to sds.folder)
- welch_folder_name:
The name of the folder in which the npz welch files will be exported (relative to sds.folder)
- sds: SpectroDataset | LTASDataset
The SpectroDataset on which the data should be written.
- analysis_typeAnalysisType
Type of the analysis to be performed. AudioDataset and SpectroDataset instances will be created depending on the flags. See osekit.public_api.analysis.AnalysisType docstring for more information.
- ads: AudioDataset
The AudioDataset on which the data should be written.
- link: bool
If set to True, the ads data will be linked to the exported files.
- subtype: str | None
The subtype of the audio files as provided by the soundfile module.
- classmethod from_dict(dictionary: dict) Dataset #
Deserialize a dataset from a dictionary.
Parameters#
- dictionary: dict
The serialized dictionary representing the dataset.
Returns#
- Dataset
The deserialized dataset.
- classmethod from_json(file: Path) Dataset #
Deserialize a Dataset from a JSON file.
Parameters#
- file: Path
Path to the serialized JSON file representing the Dataset.
Returns#
- Dataset
The deserialized BaseDataset.
- get_analysis_audiodataset(analysis: Analysis) AudioDataset #
Return an AudioDataset created from the analysis parameters.
Parameters#
- analysis: Analysis
Analysis for which to generate an AudioDataset object.
Returns#
- AudioDataset:
The AudioDataset that match the analysis parameters. This AudioDataset can be used either to have a peek at the analysis output, or to edit the analysis (adding/removing data) by editing it and passing it as a parameter to the Dataset.run_analysis() method.
- get_analysis_spectrodataset(analysis: Analysis, audio_dataset: AudioDataset | None = None) SpectroDataset | LTASDataset #
Return a SpectroDataset (or LTASDataset) created from the analysis parameters.
Parameters#
- analysis: Analysis
Analysis for which to generate an AudioDataset object.
- audio_dataset: AudioDataset|None
If provided, the SpectroDataset will be initialized from this AudioDataset. This can be used to edit the analysis (e.g. adding/removing data) before running it.
Returns#
- SpectroDataset | LTASDataset:
The SpectroDataset that match the analysis parameters. This SpectroDataset can be used, for example, to have a peek at the analysis output before running it. If Analysis.is_ltas is True, a LTASDataset is returned.
- get_dataset(dataset_name: str) type[DatasetChild] | None #
Get an analysis dataset from its name.
Parameters#
- dataset_name: str
Name of the analysis dataset.
Returns#
- type[DatasetChild]:
Analysis dataset from the dataset.datasets property.
- property origin_dataset: AudioDataset#
Return the AudioDataset from which this Dataset has been built.
- property origin_files: list[AudioFile] | None#
Return the original audio files from which this Dataset has been built.
- reset() None #
Reset the Dataset.
Resetting a dataset will move back the original audio files and the content of the “other” folder to the root folder. WARNING: all other files and folders will be deleted.
- run_analysis(analysis: Analysis, audio_dataset: AudioDataset | None = None) None #
Create a new analysis dataset from the original audio files.
The analysis parameter sets which type(s) of core_api dataset(s) will be created and added to the Dataset.datasets property, plus which output files will be written to disk (reshaped audio files, npz spectra matrices, png spectrograms…).
Parameters#
- analysis: Analysis
Analysis to run. Contains the analysis type and required info. See the public_api.Analysis.Analysis docstring for more info.
- audio_dataset: AudioDataset
If provided, the analysis will be run on this AudioDataset. Else, an AudioDataset will be created from the analysis parameters. This can be used to edit the analysis AudioDataset (adding/removing AudioData etc.)
- to_dict() dict #
Serialize a dataset to a dictionary.
Returns#
- dict:
The serialized dictionary representing the dataset.
- write_json(folder: Path | None = None) None #
Write a serialized Dataset to a JSON file.