Dataset#
- class osekit.public_api.dataset.Dataset(folder: Path, strptime_format: str | None, gps_coordinates: str | list | tuple = (0, 0), depth: float = 0.0, timezone: str | None = None, datasets: dict | None = None, job_builder: JobBuilder | None = None, instrument: Instrument | None = None, first_file_begin: Timestamp | None = None)#
Main class of the Public API.
The
Datasetcorrespond to a collection of audio, spectro and auxilarycore_apidatasets. It has additionnal metadata that can be exported, e.g. to APLOSE.Initialize a
Dataset.Parameters#
- folder: Path
Path to the folder containing the original audio files.
- strptime_format: str | None
The strptime format used in the filenames. It should use valid strftime codes (https://strftime.org/). If
None, the first audio file of the folder will start atfirst_file_begin, and each following file will start at the end of the previous one.- gps_coordinates: str | list | tuple
GPS coordinates of the location were audio files were recorded.
- depth: float
Depth at which the audio files were recorded.
- timezone: str | None
Timezone in which the audio data will be located. If the audio file timestamps are parsed with a tz-aware strptime_format (
%zor%Zcode), theAudioFileswill be converted from the parsed timezone to the specified timezone.- datasets: dict | None
Core API datasets that already belong to this dataset. Mainly used for deserialization.
- job_builder: Job_builder | None
If
None, analyses from thisDatasetwill be run locally. Otherwise, PBS job files will be created and submitted when analyses are run. See theosekit.jobmodule for more info.- instrument: Instrument | None
Instrument that might be used to obtain acoustic pressure from the
wavaudio data. See theosekit.core_api.instrumentmodule for more info.- first_file_begin: Timestamp | None
Timestamp of the first audio file being processed. Will be ignored if
striptime_formatis specified.
- property analyses: list[str]#
Return the list of the names of the analyses ran with this
Dataset.
- build() None#
Build the
Dataset.Building a
Datasetmoves the original audio files to a specific folder and creates serializedjsonfiles used by APLOSE.
- build_from_files(files: Iterable[PathLike | str], *, move_files: bool = False) None#
Build the
Datasetfrom the specified files.The files will be copied (or moved) to the
dataset.folderfolder.Parameters#
- files: Iterable[PathLike|str]
Files that are included in the dataset.
- move_files: bool
If set to
True, the files will be moved (rather than copied) in the dataset folder.
- delete_analysis(analysis_name: str) None#
Delete all output datasets from an analysis.
WARNING: all the analysis output files will be deleted.
- export_analysis(analysis_type: AnalysisType, ads: AudioDataset | None = None, sds: SpectroDataset | LTASDataset | None = None, subtype: str | None = None, matrix_folder_name: str = 'matrix', spectrogram_folder_name: str = 'spectrogram', welch_folder_name: str = 'welch', nb_jobs: int = 1, name: str = 'OSEkit_analysis', *, link: bool = False) None#
Perform an analysis and write the results on disk.
An analysis is defined as a manipulation of the original audio files: reshaping the audio, exporting
pngspectrograms ornpzmatrices (or a combination of those three) are examples of analyses. The tasks will be distributed to jobs ifself.job_builderis notNone, else it will be distributed onself.job_builder.nb_jobsjobs.Parameters#
- spectrogram_folder_name:
The name of the folder in which the
pngspectrograms will be exported (relative tosds.folder)- matrix_folder_name:
The name of the folder in which the
npzmatrices will be exported (relative tosds.folder)- welch_folder_name:
The name of the folder in which the
npzwelch files will be exported (relative tosds.folder)- sds: SpectroDataset | LTASDataset
The
SpectroDataseton which the data should be written.- analysis_typeAnalysisType
Type of the analysis to be performed.
AudioDatasetandSpectroDatasetinstances will be created depending on the flags. Seeosekit.public_api.analysis.AnalysisTypedocstring for more information.- ads: AudioDataset
The
AudioDataseton which the data should be written.- subtype: str | None
The subtype of the audio files as provided by the soundfile module.
- nb_jobs: int
The number of jobs to run in parallel.
- name: str
The name of the analysis being performed.
- link: bool
If
True, the ads data will be linked to the exported files.
- classmethod from_dict(dictionary: dict) Dataset#
Deserialize a dataset from a dictionary.
Parameters#
- dictionary: dict
The serialized dictionary representing the dataset.
Returns#
- Dataset
The deserialized dataset.
- classmethod from_json(file: Path) Dataset#
Deserialize a
Datasetfrom ajsonfile.Parameters#
- file: Path
Path to the serialized
jsonfile representing theDataset.
Returns#
- Dataset
The deserialized
Dataset.
- get_analysis_audiodataset(analysis: Analysis) AudioDataset#
Return an
AudioDatasetcreated from the analysis parameters.Parameters#
- analysis: Analysis
Analysisfor which to generate anAudioDatasetobject.
Returns#
- AudioDataset:
The
AudioDatasetthat match the analysis parameters. ThisAudioDatasetcan be used either to have a peek at the analysis output, or to edit the analysis (adding/removing data) by editing it and passing it as a parameter to theDataset.run_analysis()method.
- get_analysis_spectrodataset(analysis: Analysis, audio_dataset: AudioDataset | None = None) SpectroDataset | LTASDataset#
Return a
SpectroDataset(orLTASDataset) created from analysis parameters.Parameters#
- analysis: Analysis
Analysisfor which to generate anAudioDatasetobject.- audio_dataset: AudioDataset|None
If provided, the
SpectroDatasetwill be initialized from thisAudioDataset. This can be used to edit the analysis (e.g. adding/removing data) before running it.
Returns#
- SpectroDataset | LTASDataset:
The
SpectroDatasetthat match the analysis parameters. ThisSpectroDatasetcan be used, for example, to have a peek at the analysis output before running it. IfAnalysis.is_ltas is True, aLTASDatasetis returned.
- get_dataset(dataset_name: str) type[DatasetChild] | None#
Get an analysis dataset from its name.
Parameters#
- dataset_name: str
Name of the analysis dataset.
Returns#
- type[DatasetChild]:
Analysis dataset from the
dataset.datasetsproperty.
- get_datasets_by_analysis(analysis_name: str) list[type[DatasetChild]]#
Get all output datasets from a given analysis.
Parameters#
- analysis_name: str
Name of the analysis of which to get the output datasets.
Returns#
list[type[DatasetChild]] List of the analysis output datasets.
- property origin_dataset: AudioDataset#
Return the
AudioDatasetfrom which thisDatasethas been built.
- property origin_files: list[AudioFile] | None#
Return the original audio files from which this
Datasethas been built.
- rename_analysis(analysis_name: str, new_analysis_name: str) None#
Rename an already ran analysis.
Parameters#
- analysis_name: str
Name of the analysis to rename.
- new_analysis_name: str
New name of the analysis to rename.
- reset() None#
Reset the
Dataset.Resetting a dataset will move back the original audio files and the content of the
otherfolder to the root folder. WARNING: all other files and folders will be deleted.
- run_analysis(analysis: Analysis, audio_dataset: AudioDataset | None = None, spectro_dataset: SpectroDataset | None = None, nb_jobs: int = 1) None#
Create a new analysis dataset from the original audio files.
The analysis parameter sets which type(s) of
core_apidataset(s) will be created and added to theDataset.datasetsproperty, plus which output files will be written to disk (reshaped audio files,npzspectra matrices,pngspectrograms…).Parameters#
- analysis: Analysis
Analysisto run. Contains the analysis type and required info. See thepublic_api.Analysis.Analysisdocstring for more info.- audio_dataset: AudioDataset
If provided, the analysis will be run on this
AudioDataset. Else, anAudioDatasetwill be created from the analysis parameters. This can be used to edit the analysisAudioDataset(adding, removing, renamingAudioDataetc.)- spectro_dataset: SpectroDataset
If provided, the spectral analysis will be run on this
SpectroDataset. Else, aSpectroDatasetwill be created from theaudio_datasetif provided, or from the analysis parameters. This can be used to edit the analysisSpectroDataset(adding, removing, renamingSpectroDataetc.)- nb_jobs: int
Number of jobs to run in parallel.
- to_dict() dict#
Serialize a dataset to a dictionary.
Returns#
- dict:
The serialized dictionary representing the dataset.
- write_json(folder: Path | None = None) None#
Write a serialized Dataset to a JSON file.