Project#

class osekit.public.project.Project(folder: Path, strptime_format: str | None, gps_coordinates: str | list | tuple = (0, 0), depth: float = 0.0, timezone: str | None = None, outputs: dict | None = None, job_builder: JobBuilder | None = None, instrument: Instrument | None = None, first_file_begin: Timestamp | None = None)#

Main class of the Public API.

The Project is the class that stores the original audio dataset, and from which transforms are ran from this dataset to generate spectro datasets, reshaped audio datasets, etc. It has additionnal metadata that can be exported, e.g. to APLOSE.

Initialize a Project.

Parameters#

folder: Path

Path to the folder containing the original audio files.

strptime_format: str | None

The strptime format used in the filenames. It should use valid strftime codes (https://strftime.org/). If None, the first audio file of the folder will start at first_file_begin, and each following file will start at the end of the previous one.

gps_coordinates: str | list | tuple

GPS coordinates of the location were audio files were recorded.

depth: float

Depth at which the audio files were recorded.

timezone: str | None

Timezone in which the audio data will be located. If the audio file timestamps are parsed with a tz-aware strptime_format (%z or %Z code), the AudioFiles will be converted from the parsed timezone to the specified timezone.

outputs: dict | None

Core API Datasets that have been exported in this project. Mainly used for deserialization.

job_builder: Job_builder | None

If None, outputs from this Project will be run locally. Otherwise, PBS job files will be created and submitted when transforms are run. See the osekit.job module for more info.

instrument: Instrument | None

Instrument that might be used to obtain acoustic pressure from the wav audio data. See the osekit.core.instrument module for more info.

first_file_begin: Timestamp | None

Timestamp of the first audio file being processed. Will be ignored if striptime_format is specified.

build() None#

Build the Project.

Building a Project moves the original audio files to a specific folder and creates serialized json files used by APLOSE.

build_from_files(files: Iterable[PathLike | str], *, move_files: bool = False) None#

Build the Project from the specified files.

The files will be copied (or moved) to the project.folder folder.

Parameters#

files: Iterable[PathLike|str]

Files that are included in the project.

move_files: bool

If set to True, the files will be moved (rather than copied) in the project folder.

delete_output(output_name: str) None#

Delete all output datasets from a given ran transform name.

WARNING: all the output files will be deleted.

Parameters#

output_name: str

Name of the transform whose output to delete.

deserialize_output(output_name: str) type[DatasetChild]#

Deserialize an output dataset from its json file.

The self.outputs property will be updated so that it stores the deserialized dataset rather than the json file so that it is deserialized only once.

Parameters#

output_name: str

Name of the output dataset.

Returns#

type[DatasetChild]:

The deserialized output dataset.

export(output_type: OutputType, ads: AudioDataset | None = None, sds: SpectroDataset | LTASDataset | None = None, subtype: str | None = None, spectrum_folder_name: str = 'spectrum', spectrogram_folder_name: str = 'spectrogram', welch_folder_name: str = 'welch', nb_jobs: int = 1, name: str = 'OSEkit_transform', *, link: bool = False) None#

Perform a transform and write the results on disk.

An transform is defined as a manipulation of the original audio files: reshaping the audio, exporting png spectrograms or npz matrices (or a combination of those three) are examples of transforms. The tasks will be distributed to jobs if self.job_builder is not None, else it will be distributed on self.job_builder.nb_jobs jobs.

Parameters#

spectrogram_folder_name:

The name of the folder in which the png spectrograms will be exported (relative to sds.folder)

spectrum_folder_name:

The name of the folder in which the npz matrices will be exported (relative to sds.folder)

welch_folder_name:

The name of the folder in which the npz welch files will be exported (relative to sds.folder)

sds: SpectroDataset | LTASDataset

The SpectroDataset on which the data should be written.

output_typeOutputType

Type of the transform to be performed. AudioDataset and SpectroDataset instances will be created depending on the flags. See osekit.public.transform.OutputType docstring for more information.

ads: AudioDataset

The AudioDataset on which the data should be written.

subtype: str | None

The subtype of the audio files as provided by the soundfile module.

nb_jobs: int

The number of jobs to run in parallel.

name: str

The name of the transform being performed.

link: bool

If True, the ads data will be linked to the exported files.

classmethod from_dict(dictionary: dict) Project#

Deserialize a project from a dictionary.

Parameters#

dictionary: dict

The serialized dictionary representing the project.

Returns#

Project

The deserialized project.

classmethod from_json(file: Path) Project#

Deserialize a Project from a json file.

Parameters#

file: Path

Path to the serialized json file representing the Project.

Returns#

Project

The deserialized Project.

get_output(output_name: str) type[DatasetChild] | None#

Get an output dataset from its name.

Parameters#

output_name: str

Name of the output dataset.

Returns#

type[DatasetChild]:

Output dataset from the project.outputs property.

get_output_by_transform_name(transform_name: str) list[type[DatasetChild]]#

Get all output output datasets from a given transform.

Parameters#

transform_name: str

Name of the transform of which to get the output datasets.

Returns#

list[type[DatasetChild]] List of the output datasets.

property origin_dataset: AudioDataset#

Return the AudioDataset from which this Project has been built.

property origin_files: list[AudioFile] | None#

Return the original audio files from which this Project has been built.

prepare_audio(transform: Transform) AudioDataset#

Return an AudioDataset created from the transform parameters.

Parameters#

transform: Transform

Transform for which to generate an AudioDataset object.

Returns#

AudioDataset:

The AudioDataset that match the transform parameters. This AudioDataset can be used either to have a peek at the transform output, or to edit the transform (adding/removing data) by editing it and passing it as a parameter to the Project.run() method.

prepare_spectro(transform: Transform, audio_dataset: AudioDataset | None = None) SpectroDataset | LTASDataset#

Return a SpectroDataset (or LTASDataset) created from transform parameters.

Parameters#

transform: Transform

Transform for which to generate an AudioDataset object.

audio_dataset: AudioDataset|None

If provided, the SpectroDataset will be initialized from this AudioDataset. This can be used to edit the transform (e.g. adding/removing data) before running it.

Returns#

SpectroDataset | LTASDataset:

The SpectroDataset that match the transform parameters. This SpectroDataset can be used, for example, to have a peek at the transform output before running it. If Transform.is_ltas is True, a LTASDataset is returned.

rename_output(output_name: str, new_output_name: str) None#

Rename an already ran transform.

Parameters#

output_name: str

Name of the transform to rename.

new_output_name: str

New name of the transform to rename.

reset() None#

Reset the Project.

Resetting a project will move back the original audio files and the content of the other folder to the root folder. WARNING: all other files and folders will be deleted.

run(transform: Transform, audio_dataset: AudioDataset | None = None, spectro_dataset: SpectroDataset | None = None, nb_jobs: int = 1) None#

Create a new transform dataset from the original audio files.

The transform parameter sets which type(s) of core dataset(s) will be created and added to the Project.outputs property, plus which output files will be written to disk (reshaped audio files, npz spectra matrices, png spectrograms…).

Parameters#

transform: Transform

Transform to run. Contains the transform type and required info. See the public.transform.Transform docstring for more info.

audio_dataset: AudioDataset

If provided, the transform will be run on this AudioDataset. Else, an AudioDataset will be created from the transform parameters. This can be used to edit the transform AudioDataset (adding, removing, renaming AudioData etc.)

spectro_dataset: SpectroDataset

If provided, the spectral transform will be run on this SpectroDataset. Else, a SpectroDataset will be created from the audio_dataset if provided, or from the transform parameters. This can be used to edit the transform SpectroDataset (adding, removing, renaming SpectroData etc.)

nb_jobs: int

Number of jobs to run in parallel.

to_dict() dict#

Serialize a project to a dictionary.

Returns#

dict:

The serialized dictionary representing the project.

property transforms: list[str]#

Return the list of the names of the transforms ran with this Project.

write_json(folder: Path | None = None) None#

Write a serialized Project to a JSON file.