API Documentation¶
- class Dataset(dataset_path: str, *, gps_coordinates: str | list | tuple = (0, 0), depth: str | int = 0, timezone: str | None = None, owner_group: str | None = None, original_folder: str | None = None, local: bool = True)¶
Super class used to create dataset compatible with the rest of the package.
A dataset is a set of audio files located in a folder whose name is the dataset name. The files must be in the
raw/audio/original`subfolder. and be alongside a `timestamp.csv
file, which includes the name of the file and the associated timestamp, in the%Y-%m-%dT%H:%M:%S.%fZ
strftime format.This file can be created using the
OSmOSE.write_timestamp
function.- __init__(dataset_path: str, *, gps_coordinates: str | list | tuple = (0, 0), depth: str | int = 0, timezone: str | None = None, owner_group: str | None = None, original_folder: str | None = None, local: bool = True) None ¶
Instanciate the dataset with at least its path.
Parameters¶
- dataset_path
str
The absolute path to the dataset folder. The last folder in the path will be considered as the name of the dataset.
- gps_coordinates
str
orlist
orTuple
, optional, keyword-only The GPS coordinates of the listening location. If it is of type
str
, it must be the name of a csv file located indata/auxiliary
, otherwise a list or a tuple with the first element being the latitude coordinates and second the longitude coordinates.- owner_group
str
, optional, keyword-only The name of the group using the OsmOSE package. All files created using this dataset will be accessible by the osmose group. Will not work on Windows.
- original_folder
str
, optional, keyword-only The path to the folder containing the original audio files. It can be set right away, passed in the build() function or automatically detected.
Example¶
>>> from pathlib import Path >>> from OSmOSE import Dataset >>> dataset = Dataset(Path("home","user","my_dataset"), coordinates = [49.2, -5], owner_group = "gosmose")
- dataset_path
- property name¶
str: The Dataset name. It is readonly.
- property path¶
Path: The Dataset path. It is readonly.
- property original_folder¶
Path: The folder containing the original audio file.
- property gps_coordinates: Tuple[float, float] | Tuple[Tuple[float, float], Tuple[float, float]]¶
The GPS coordinates of the listening location. First element is latitude, second is longitude.
GPS coordinates are used to localize the dataset and required for some utilities, like the weather and environment utility.
Parameter¶
- coordinates:
str
orlist
ortuple
If the coordinates are a string, it must be the name of a csv file located in
data/auxiliary/instrument/
, containing two columns: ‘lat’ and ‘lon’ Else, they can be either a list or a tuple of two float, the first being the latitude and second the longitude; or a list or a tuple containing two lists or tuples respectively of floats. In this case, the coordinates are not treated as a point but as an area.
Returns¶
The GPS coordinates as a tuple.
- coordinates:
- property depth: int¶
The depth of the hydrophone, in meter.
Parameter¶
- depth:
str
orint
If the depth is a string, it must be the name of a csv file located in
data/auxiliary/instrument/
, containing at least a column ‘depth’
Returns¶
The depth as an int.
- depth:
- property owner_group: str¶
str: The Unix group able to interact with the dataset.
- build(*, original_folder: str | None = None, owner_group: str | None = None, date_template: str = '%Y-%m-%dT%H:%M:%S.%f%z', auto_normalization: bool = False, force_upload: bool = False, number_test_bad_files: int = 1, dico_aux_substring: dict = {'environment': ['insitu'], 'instrument': ['depth', 'gps']}) None ¶
Set up the architecture of the dataset.
- The following operations will be performed on the dataset. None of them are destructive:
open and read the header of audio files located in
raw/audio/original/
.rename files containing illegal characters.
generate some stastics regarding the files and dataset durations.
write the raw/metadata.csv file.
Identify and record files with anomalies (short duration, unreadable header…).
Set the permission of the dataset to the osmose group.
Parameters¶
- original_folder:
str
, optional, keyword-only The name of the folder containing the original audio file. It is named “original” by convention. If none is passed, the program expects to find either only one folder in the dataset audio folder, or a folder named
original
- owner_group:
str
, optional, keyword_only The name of the group using the osmose dataset. It will have all permissions over the dataset.
- date_template:
str
, optional, keyword_only the date template in strftime format. For example,
2017/02/24
has the template%Y/%m/%d
. It is used to generate automatically the timestamp.csv file. Alternatively, you can call the script to create the timestamp file first. If no template is provided, will assume that the file already exists. In future versions, the template will be guessed automatically. For more information on strftime template, see https://strftime.org/.- auto_normalization:
bool
, optional, keyword_only If true, automatically normalize audio files if the data would cause issues downstream. The default is False.
- force_upload:
bool
, optional, keyword_only If true, ignore the file anomalies and build the dataset anyway. The default is False.
Returns¶
- dataset:
Dataset
The dataset object.
Example¶
>>> from pathlib import Path >>> from OSmOSE import Dataset >>> dataset = Dataset(Path("home","user","my_dataset")) >>> dataset.build()
DONE ! your dataset is on OSmOSE platform !