Reshaping multiple files with the Core API [1]#
Create an OSEkit AudioDataset from the files on disk, by directly specifying the time-related requirements in the constructor.
We will only use the folder in which the files are located: we donβt have to dig up to the file level.
from pathlib import Path
audio_folder = Path(r"_static/sample_audio/timestamped")
from osekit.core.audio_dataset import AudioDataset
from osekit.utils.audio import Normalization
from pandas import Timestamp, Timedelta
audio_dataset = AudioDataset.from_folder(
folder=audio_folder,
strptime_format="%y%m%d_%H%M%S",
begin=Timestamp("2022-09-25 22:35:15"),
end=Timestamp("2022-09-25 22:36:25"),
data_duration=Timedelta(seconds=5),
overlap=0.25,
sample_rate=24_000,
normalization=Normalization.DC_REJECT,
)
The AudioDataset object contains all the to-be-exported AudioData:
print(f"{' AUDIO DATASET ':#^60}")
print(f"{'Begin:':<30}{str(audio_dataset.begin):>30}")
print(f"{'End:':<30}{str(audio_dataset.end):>30}")
print(f"{'Sample rate:':<30}{str(audio_dataset.sample_rate):>30}")
print(f"{'Nb of audio data:':<30}{str(len(audio_dataset.data)):>30}")
###################### AUDIO DATASET #######################
Begin: 2022-09-25 22:35:15
End: 2022-09-25 22:36:27.500000
Sample rate: 24000
Nb of audio data: 19
We also wanted to skip the AudioData that are in the gap between recordings.
Such AudioData have no linked file, thus their is_empty property should be True.
print(f"{' BEFORE FILTERING ':#^60}")
print(
f"{'Nb of Empty data:':<30}{str(len([ad for ad in audio_dataset.data if ad.is_empty])):>30}\n"
)
# Remove the empty data:
removed_data = audio_dataset.remove_empty_data(threshold=0.0)
# We can take a look at which data has been removed:
print(f"{' REMOVED DATA ':#^60}")
print(f"{'Begin':<20}{'Duration':^20}{'Fill rate':>20}")
for data in removed_data:
print(
f"{data.begin.strftime('%H:%M:%S'):<20}{str(data.duration):^20}{str(data.populated_ratio) + ' %':>20}"
)
##################### BEFORE FILTERING #####################
Nb of Empty data: 4
####################### REMOVED DATA #######################
Begin Duration Fill rate
22:35:41 0 days 00:00:05 0.0 %
22:35:45 0 days 00:00:05 0.0 %
22:35:48 0 days 00:00:05 0.0 %
22:35:52 0 days 00:00:05 0.0 %
The AudioData should now only contain non-empty AudioData:
print(f"{' AFTER FILTERING ':#^60}")
print(f"{'Nb of audio data:':<30}{str(len(audio_dataset.data)):>30}")
print(
f"{'Nb of Empty data:':<30}{str(len([ad for ad in audio_dataset.data if ad.is_empty])):>30}\n"
)
##################### AFTER FILTERING ######################
Nb of audio data: 15
Nb of Empty data: 0
Export all the AudioData of the AudioDataset at once:
audio_dataset.write(audio_folder / "exported_files")