Reshaping multiple files with the Core API

Reshaping multiple files with the Core API [1]#

Create an OSEkit AudioDataset from the files on disk, by directly specifying the time-related requirements in the constructor.

We will only use the folder in which the files are located: we don’t have to dig up to the file level.

from pathlib import Path

audio_folder = Path(r"_static/sample_audio/timestamped")

from osekit.core.audio_dataset import AudioDataset
from osekit.utils.audio import Normalization
from pandas import Timestamp, Timedelta

audio_dataset = AudioDataset.from_folder(
    folder=audio_folder,
    strptime_format="%y%m%d_%H%M%S",
    begin=Timestamp("2022-09-25 22:35:15"),
    end=Timestamp("2022-09-25 22:36:25"),
    data_duration=Timedelta(seconds=5),
    overlap=0.25,
    sample_rate=24_000,
    normalization=Normalization.DC_REJECT,
)

The AudioDataset object contains all the to-be-exported AudioData:

print(f"{' AUDIO DATASET ':#^60}")
print(f"{'Begin:':<30}{str(audio_dataset.begin):>30}")
print(f"{'End:':<30}{str(audio_dataset.end):>30}")
print(f"{'Sample rate:':<30}{str(audio_dataset.sample_rate):>30}")
print(f"{'Nb of audio data:':<30}{str(len(audio_dataset.data)):>30}")
###################### AUDIO DATASET #######################
Begin:                                   2022-09-25 22:35:15
End:                              2022-09-25 22:36:27.500000
Sample rate:                                           24000
Nb of audio data:                                         19

We also wanted to skip the AudioData that are in the gap between recordings. Such AudioData have no linked file, thus their is_empty property should be True.

print(f"{' BEFORE FILTERING ':#^60}")
print(
    f"{'Nb of Empty data:':<30}{str(len([ad for ad in audio_dataset.data if ad.is_empty])):>30}\n"
)

# Remove the empty data:
removed_data = audio_dataset.remove_empty_data(threshold=0.0)

# We can take a look at which data has been removed:
print(f"{' REMOVED DATA ':#^60}")
print(f"{'Begin':<20}{'Duration':^20}{'Fill rate':>20}")
for data in removed_data:
    print(
        f"{data.begin.strftime('%H:%M:%S'):<20}{str(data.duration):^20}{str(data.populated_ratio) + ' %':>20}"
    )
##################### BEFORE FILTERING #####################
Nb of Empty data:                                          4

####################### REMOVED DATA #######################
Begin                     Duration                 Fill rate
22:35:41              0 days 00:00:05                  0.0 %
22:35:45              0 days 00:00:05                  0.0 %
22:35:48              0 days 00:00:05                  0.0 %
22:35:52              0 days 00:00:05                  0.0 %

The AudioData should now only contain non-empty AudioData:

print(f"{' AFTER FILTERING ':#^60}")
print(f"{'Nb of audio data:':<30}{str(len(audio_dataset.data)):>30}")
print(
    f"{'Nb of Empty data:':<30}{str(len([ad for ad in audio_dataset.data if ad.is_empty])):>30}\n"
)
##################### AFTER FILTERING ######################
Nb of audio data:                                         15
Nb of Empty data:                                          0

Export all the AudioData of the AudioDataset at once:

audio_dataset.write(audio_folder / "exported_files")