API Documentation

class Dataset(dataset_path: str, *, gps_coordinates: str | list | tuple = (0, 0), depth: str | int = 0, timezone: str | None = None, owner_group: str | None = None, original_folder: str | None = None, local: bool = True)

Super class used to create dataset compatible with the rest of the package.

A dataset is a set of audio files located in a folder whose name is the dataset name. The files must be in the raw/audio/original`subfolder. and be alongside a `timestamp.csv file, which includes the name of the file and the associated timestamp, in the %Y-%m-%dT%H:%M:%S.%fZ strftime format.

This file can be created using the OSmOSE.write_timestamp function.

__init__(dataset_path: str, *, gps_coordinates: str | list | tuple = (0, 0), depth: str | int = 0, timezone: str | None = None, owner_group: str | None = None, original_folder: str | None = None, local: bool = True) None

Instanciate the dataset with at least its path.

Parameters

dataset_pathstr

The absolute path to the dataset folder. The last folder in the path will be considered as the name of the dataset.

gps_coordinatesstr or list or Tuple, optional, keyword-only

The GPS coordinates of the listening location. If it is of type str, it must be the name of a csv file located in data/auxiliary, otherwise a list or a tuple with the first element being the latitude coordinates and second the longitude coordinates.

owner_groupstr, optional, keyword-only

The name of the group using the OsmOSE package. All files created using this dataset will be accessible by the osmose group. Will not work on Windows.

original_folderstr, optional, keyword-only

The path to the folder containing the original audio files. It can be set right away, passed in the build() function or automatically detected.

Example

>>> from pathlib import Path
>>> from OSmOSE import Dataset
>>> dataset = Dataset(Path("home","user","my_dataset"), coordinates = [49.2, -5], owner_group = "gosmose")
property name

str: The Dataset name. It is readonly.

property path

Path: The Dataset path. It is readonly.

property original_folder

Path: The folder containing the original audio file.

property gps_coordinates: Tuple[float, float] | Tuple[Tuple[float, float], Tuple[float, float]]

The GPS coordinates of the listening location. First element is latitude, second is longitude.

GPS coordinates are used to localize the dataset and required for some utilities, like the weather and environment utility.

Parameter

coordinates: str or list or tuple

If the coordinates are a string, it must be the name of a csv file located in data/auxiliary/instrument/, containing two columns: ‘lat’ and ‘lon’ Else, they can be either a list or a tuple of two float, the first being the latitude and second the longitude; or a list or a tuple containing two lists or tuples respectively of floats. In this case, the coordinates are not treated as a point but as an area.

Returns

The GPS coordinates as a tuple.

property depth: int

The depth of the hydrophone, in meter.

Parameter

depth: str or int

If the depth is a string, it must be the name of a csv file located in data/auxiliary/instrument/, containing at least a column ‘depth’

Returns

The depth as an int.

property owner_group: str

str: The Unix group able to interact with the dataset.

build(*, original_folder: str | None = None, owner_group: str | None = None, date_template: str = '%Y-%m-%dT%H:%M:%S.%f%z', auto_normalization: bool = False, force_upload: bool = False, number_test_bad_files: int = 1, dico_aux_substring: dict = {'environment': ['insitu'], 'instrument': ['depth', 'gps']}) None

Set up the architecture of the dataset.

The following operations will be performed on the dataset. None of them are destructive:
  • open and read the header of audio files located in raw/audio/original/.

  • rename files containing illegal characters.

  • generate some stastics regarding the files and dataset durations.

  • write the raw/metadata.csv file.

  • Identify and record files with anomalies (short duration, unreadable header…).

  • Set the permission of the dataset to the osmose group.

Parameters

original_folder: str, optional, keyword-only

The name of the folder containing the original audio file. It is named “original” by convention. If none is passed, the program expects to find either only one folder in the dataset audio folder, or a folder named original

owner_group: str, optional, keyword_only

The name of the group using the osmose dataset. It will have all permissions over the dataset.

date_template: str, optional, keyword_only

the date template in strftime format. For example, 2017/02/24 has the template %Y/%m/%d. It is used to generate automatically the timestamp.csv file. Alternatively, you can call the script to create the timestamp file first. If no template is provided, will assume that the file already exists. In future versions, the template will be guessed automatically. For more information on strftime template, see https://strftime.org/.

auto_normalization: bool, optional, keyword_only

If true, automatically normalize audio files if the data would cause issues downstream. The default is False.

force_upload: bool, optional, keyword_only

If true, ignore the file anomalies and build the dataset anyway. The default is False.

Returns

dataset: Dataset

The dataset object.

Example

>>> from pathlib import Path
>>> from OSmOSE import Dataset
>>> dataset = Dataset(Path("home","user","my_dataset"))
>>> dataset.build()

DONE ! your dataset is on OSmOSE platform !