Reshaping multiple files with the Public API [1]#
As always in the Public API, the first step is to build the dataset:
from pathlib import Path
audio_folder = Path(r"_static/sample_audio/timestamped")
from osekit.public_api.dataset import Dataset
dataset = Dataset(
folder=audio_folder,
strptime_format="%y%m%d_%H%M%S",
)
dataset.build()
2025-12-16 14:38:15,807
Building the dataset...
2025-12-16 14:38:15,808
Analyzing original audio files...
2025-12-16 14:38:15,817
Organizing dataset folder...
2025-12-16 14:38:15,821
Build done!
The Public API Dataset is now analyzed and organized:
print(f"{' DATASET ':#^60}")
print(f"{'Begin:':<30}{str(dataset.origin_dataset.begin):>30}")
print(f"{'End:':<30}{str(dataset.origin_dataset.end):>30}")
print(f"{'Sample rate:':<30}{str(dataset.origin_dataset.sample_rate):>30}\n")
print(f"{' ORIGINAL FILES ':#^60}")
import pandas as pd
pd.DataFrame(
[
{
"Name": f.path.name,
"Begin": f.begin,
"End": f.end,
"Sample Rate": f.sample_rate,
}
for f in dataset.origin_files
],
).set_index("Name")
######################### DATASET ##########################
Begin: 2022-09-25 22:34:50
End: 2022-09-25 22:36:50
Sample rate: 48000
###################### ORIGINAL FILES ######################
| Begin | End | Sample Rate | |
|---|---|---|---|
| Name | |||
| sample_220925_223450.wav | 2022-09-25 22:34:50 | 2022-09-25 22:35:00 | 48000 |
| sample_220925_223500.wav | 2022-09-25 22:35:00 | 2022-09-25 22:35:10 | 48000 |
| sample_220925_223510.wav | 2022-09-25 22:35:10 | 2022-09-25 22:35:20 | 48000 |
| sample_220925_223520.wav | 2022-09-25 22:35:20 | 2022-09-25 22:35:30 | 48000 |
| sample_220925_223530.wav | 2022-09-25 22:35:30 | 2022-09-25 22:35:40 | 48000 |
| sample_220925_223600.wav | 2022-09-25 22:36:00 | 2022-09-25 22:36:10 | 48000 |
| sample_220925_223610.wav | 2022-09-25 22:36:10 | 2022-09-25 22:36:20 | 48000 |
| sample_220925_223620.wav | 2022-09-25 22:36:20 | 2022-09-25 22:36:30 | 48000 |
| sample_220925_223630.wav | 2022-09-25 22:36:30 | 2022-09-25 22:36:40 | 48000 |
| sample_220925_223640.wav | 2022-09-25 22:36:40 | 2022-09-25 22:36:50 | 48000 |
To run analyses in the Public API, use the Analysis class:
from osekit.public_api.analysis import Analysis, AnalysisType
from osekit.utils.audio_utils import Normalization
from pandas import Timestamp, Timedelta
analysis = Analysis(
analysis_type=AnalysisType.AUDIO, # we just want to export the reshaped audio,
begin=Timestamp("2022-09-25 22:35:15"),
end=Timestamp("2022-09-25 22:36:25"),
data_duration=Timedelta(seconds=5),
overlap=0.25,
sample_rate=24_000,
normalization=Normalization.DC_REJECT,
name="reshape_example",
)
The Core API can still be used on top of the Public API.
Here, we filter out the empty AudioData with some Core API:
# Returns a Core API AudioDataset that matches the analysis
audio_dataset = dataset.get_analysis_audiodataset(analysis=analysis)
# Filter the returned AudioDataset
audio_dataset.data = [ad for ad in audio_dataset.data if not ad.is_empty]
2025-12-16 14:38:15,850
Creating the audio data...
Running the analysis while specifying the filtered audio_dataset will skip the empty AudioData.
dataset.run_analysis(analysis=analysis, audio_dataset=audio_dataset)
2025-12-16 14:38:15,861
Running analysis...
2025-12-16 14:38:15,861
Writing audio files...
All the new files from the analysis are stored in an AudioDataset named after analysis.name:
pd.DataFrame(
[
{
"Exported file": list(ad.files)[0].path.name,
"Begin": ad.begin,
"End": ad.end,
"Sample Rate": ad.sample_rate,
}
for ad in dataset.get_dataset(analysis.name).data
],
).set_index("Exported file")
| Begin | End | Sample Rate | |
|---|---|---|---|
| Exported file | |||
| 2022_09_25_22_35_15_000000.wav | 2022-09-25 22:35:15.000 | 2022-09-25 22:35:20.000 | 24000 |
| 2022_09_25_22_35_18_750000.wav | 2022-09-25 22:35:18.750 | 2022-09-25 22:35:23.750 | 24000 |
| 2022_09_25_22_35_22_500000.wav | 2022-09-25 22:35:22.500 | 2022-09-25 22:35:27.500 | 24000 |
| 2022_09_25_22_35_26_250000.wav | 2022-09-25 22:35:26.250 | 2022-09-25 22:35:31.250 | 24000 |
| 2022_09_25_22_35_30_000000.wav | 2022-09-25 22:35:30.000 | 2022-09-25 22:35:35.000 | 24000 |
| 2022_09_25_22_35_33_750000.wav | 2022-09-25 22:35:33.750 | 2022-09-25 22:35:38.750 | 24000 |
| 2022_09_25_22_35_37_500000.wav | 2022-09-25 22:35:37.500 | 2022-09-25 22:35:42.500 | 24000 |
| 2022_09_25_22_35_56_250000.wav | 2022-09-25 22:35:56.250 | 2022-09-25 22:36:01.250 | 24000 |
| 2022_09_25_22_36_00_000000.wav | 2022-09-25 22:36:00.000 | 2022-09-25 22:36:05.000 | 24000 |
| 2022_09_25_22_36_03_750000.wav | 2022-09-25 22:36:03.750 | 2022-09-25 22:36:08.750 | 24000 |
| 2022_09_25_22_36_07_500000.wav | 2022-09-25 22:36:07.500 | 2022-09-25 22:36:12.500 | 24000 |
| 2022_09_25_22_36_11_250000.wav | 2022-09-25 22:36:11.250 | 2022-09-25 22:36:16.250 | 24000 |
| 2022_09_25_22_36_15_000000.wav | 2022-09-25 22:36:15.000 | 2022-09-25 22:36:20.000 | 24000 |
| 2022_09_25_22_36_18_750000.wav | 2022-09-25 22:36:18.750 | 2022-09-25 22:36:23.750 | 24000 |
| 2022_09_25_22_36_22_500000.wav | 2022-09-25 22:36:22.500 | 2022-09-25 22:36:27.500 | 24000 |