Skip to content

Generate a dataset

All annotation campaigns are base on a dataset. Each dataset should be imported.

The datasets must be located in a datasets folder.

OSEkit Recommended

We developed our Python package to manipulate audio files. This package allows to directly output the dataset in a file structure that can be read by APLOSE.

This package can be found on GitHub: https://github.com/Project-OSmOSE/OSEkit

You can also directly access its documentation: https://project-osmose.github.io/OSEkit/

Manually

To create a dataset manually you must understand the expected file structure and metadata files.

File structure

The root folder is named dataset.

dataset/
├── datasets.csv
├── {Acquisition campaign}/
│   ├── {Dataset}
│   └── ...
├── {Dataset}
└── ...

The datasets folder can be located at the root or within an "Acquisition campaign" folder. Each dataset folder contains the file structure detailed below.

.
└── {Dataset}/
    ├── data/
    │   └── audio/
    │       ├── {Spectrogram duration}_{Sample rate}/
    │       │   ├── metadata.csv
    │       │   ├── timestamp.csv
    │       │   ├── {audio file name}.wav
    │       │   └── ...
    │       └── ...
    └── processed/
        └── spectrogram/
            ├── {Spectrogram duration}_{Sample rate}/
            │   ├── {NFFT}_{Window size}_{Overlap}_{frequency scale}/
            │   │   ├── metadata.csv
            │   │   └── image/
            │   │       ├── {audio file name}_{zoom level}_{index in zoom level}.png
            │   │       └── ...
            │   └── ...
            └── ...

All the spectrogram must be pre-processed and available in the dataset folder. There should be at least one spectrogram image by audio file.

To recover the appropriate image in the annotator display, APLOSE uses the audio file name, the zoom level and the index in the zoom level to get the image name. Zoom levels increase as a power of 2. The naming of the spectrogram is really important.

Understanding the spectrogram image naming

For a "sound.wav" audio file there should be at least a "sound_1_0.png" image: corresponds to the first zoom level and the first (and only) image.

For a second zoom level:

  • sound_2_0.png
  • sound_2_1.png

For a third level:

  • sound_4_0.png
  • sound_4_1.png
  • sound_4_2.png
  • sound_4_3.png

And so on...

Metadata files

datasets.csv

Located at ./datasets.csv

This is where APLOSE will search for new datasets. This file list the available datasets and give information to recover the audio and spectrogram files.

ColumnTypeDescription
pathstringPath from the dataset root folder (ie: {Acquisition campaign}/{Dataset} or {Dataset})
datasetstringThe name to display the dataset in APLOSE, this name can be different from the folder name and must be unique in APLOSE
spectro_durationintThe duration of the files (in seconds)
dataset_srintThe sample rate (in Hz)
file_typestringThe type of file used (example: .wav)

[Audio] metadata.csv

Located at ./{Dataset}/data/audio/{Spectrogram duration}_{Sample rate}/metadata.csv

This file describe the audio files. It gives information about the process they had.

ColumnTypeDescription
sample_bitsarray of stringFiles subtypes (example: ['PCM-16'])
channel_countintNumber of channels in the file
start_datetimestampStart of the dataset
end_datetimestampEnd of the dataset
dataset_srintSample rate of the processed files (in Hz)
audio_file_dataset_durationintDuration of each files (in seconds)
audio_file_countintNumber of files

[Audio] timestamp.csv

Located at ./{Dataset}/data/audio/{Spectrogram duration}_{Sample rate}/timestamp.csv

This file lists all the audio files. APLOSE uses it over the file structure to known which files to use.

INFO

This mean you can create a subset by removing files from this CSV. Or if you don't want to listen audio while annotating, you can add this CSV without the associated wav files.

ColumnTypeDescription
filenamestringName of the file (with the extension)
timestamptimestampStart of the audio file

[Spectrogram] metadata.csv

Located at ./{Dataset}/processed/spectrogram/{Spectrogram duration}_{Sample rate}/{NFFT}_{Window size}_{Overlap}_{frequency scale}/metadata.csv

This files describe the spectrogram generation process.

ColumnTypeDescription
dataset_srintSample rate (in Hz)
nfftintNumber of frequency bins in the fft
window_sizeintNumber of audio samples in each fft bin
overlapintNumber of overlapping samples between two fft bins
colormapstringColormap used to generate the spectrogram
zoom_levelintNumber of available zoom levels
dynamic_minintLower limit of noise level scale (in dB)
dynamic_maxintUpper limit of noise level scale (in dB)
spectro_durationintDuration of the spectrogram (in seconds)
data_normalizationstringType of data normalization (2 possible values : if the sensitivity and gain values are available chose 'instrument' else, chose zscore)
hp_filter_min_freqintCut-off frequency of the high pass filter (in Hz)
sensitivity_dBfloatSensitivity of the instrument (in dB)
peak_voltagefloatPeak voltage of the instrument (in Volt)
spectro_normalizationstringType of normalization for the spectrogram computation (should be set at density if data_normalization = instrument and spectrum if data_normalization = zscore)
gain_dBintGain of the instrument (in dB)
zscore_durationstringDuration over which noise level is averaged in zscore configuration (in seconds)
window_typestringType of analysis window (eg : 'Hammin', 'Hanning', 'Blackman')
frequency_resolutionfloatFrequency resolution of spectrogram (in Hz)
temporal_resolutionfloatTemporal resolution of spectrogram (in seconds)
audio_file_dataset_overlapintTemporal overlap between each spectrogram (in seconds)
custom_frequency_scalestringName of the frequency scale to apply
Available frequency scales
ScaleDescription
linearLinear scale from 0 to sample rate / 2
audibleLinear scale from 0 to 22kHz
porp_delphMulti-linear scale:
  • 0-50%: 0 to 30kHz
  • 50-70%: 30kHz to 80kHz
  • 70-100%: 80kHz to sample rate / 2
dual_lf_hfMulti-linear scale:
  • 0-50%: 0 to 22kHz
  • 50-100%: 22kHz to sample rate / 2