Reshaping multiple files with the Core API [1]#

Create an OSEkit AudioDataset from the files on disk, by directly specifying the time-related requirements in the constructor.

We will only use the folder in which the files are located: we don’t have to dig up to the file level.

from pathlib import Path

audio_folder = Path(r"_static/sample_audio/timestamped")

from osekit.core_api.audio_dataset import AudioDataset
from osekit.utils.audio_utils import Normalization
from pandas import Timestamp, Timedelta

audio_dataset = AudioDataset.from_folder(
    folder=audio_folder,
    strptime_format="%y%m%d_%H%M%S",
    begin=Timestamp("2022-09-25 22:35:15"),
    end=Timestamp("2022-09-25 22:36:25"),
    data_duration=Timedelta(seconds=5),
    overlap=0.25,
    sample_rate=24_000,
    normalization=Normalization.DC_REJECT,
)

The AudioDataset object contains all the to-be-exported AudioData:

print(f"{' AUDIO DATASET ':#^60}")
print(f"{'Begin:':<30}{str(audio_dataset.begin):>30}")
print(f"{'End:':<30}{str(audio_dataset.end):>30}")
print(f"{'Sample rate:':<30}{str(audio_dataset.sample_rate):>30}")
print(f"{'Nb of audio data:':<30}{str(len(audio_dataset.data)):>30}")

###################### AUDIO DATASET #######################
Begin:                                   2022-09-25 22:35:15
End:                              2022-09-25 22:36:27.500000
Sample rate:                                           24000
Nb of audio data:                                         19

We also wanted to skip the AudioData that are in the gap between recordings. Such AudioData have no linked file, thus their is_empty property should be True.

print(f"{' BEFORE FILTERING ':#^60}")
print(
    f"{'Nb of Empty data:':<30}{str(len([ad for ad in audio_dataset.data if ad.is_empty])):>30}\n"
)

# Remove the empty data by using the default AudioDataset constructor:
audio_dataset = AudioDataset([ad for ad in audio_dataset.data if not ad.is_empty])

##################### BEFORE FILTERING #####################
Nb of Empty data:                                          4

The AudioData should now only contain non-empty AudioData:

print(f"{' AFTER FILTERING ':#^60}")
print(f"{'Nb of audio data:':<30}{str(len(audio_dataset.data)):>30}")
print(
    f"{'Nb of Empty data:':<30}{str(len([ad for ad in audio_dataset.data if ad.is_empty])):>30}\n"
)

##################### AFTER FILTERING ######################
Nb of audio data:                                         15
Nb of Empty data:                                          0

Export all the AudioData of the AudioDataset at once:

audio_dataset.write(audio_folder / "exported_files")