Reshaping multiple files with the Public API [1]#

As always in the Public API, the first step is to build the dataset:

from pathlib import Path

audio_folder = Path(r"_static/sample_audio/timestamped")

from osekit.public_api.dataset import Dataset

dataset = Dataset(
    folder=audio_folder,
    strptime_format="%y%m%d_%H%M%S",
)

dataset.build()

	2025-12-16 14:38:15,807
Building the dataset...

	2025-12-16 14:38:15,808
Analyzing original audio files...

	2025-12-16 14:38:15,817
Organizing dataset folder...

	2025-12-16 14:38:15,821
Build done!

The Public API Dataset is now analyzed and organized:

print(f"{' DATASET ':#^60}")
print(f"{'Begin:':<30}{str(dataset.origin_dataset.begin):>30}")
print(f"{'End:':<30}{str(dataset.origin_dataset.end):>30}")
print(f"{'Sample rate:':<30}{str(dataset.origin_dataset.sample_rate):>30}\n")

print(f"{' ORIGINAL FILES ':#^60}")
import pandas as pd

pd.DataFrame(
    [
        {
            "Name": f.path.name,
            "Begin": f.begin,
            "End": f.end,
            "Sample Rate": f.sample_rate,
        }
        for f in dataset.origin_files
    ],
).set_index("Name")

######################### DATASET ##########################
Begin:                                   2022-09-25 22:34:50
End:                                     2022-09-25 22:36:50
Sample rate:                                           48000

###################### ORIGINAL FILES ######################

	Begin	End	Sample Rate
Name
sample_220925_223450.wav	2022-09-25 22:34:50	2022-09-25 22:35:00	48000
sample_220925_223500.wav	2022-09-25 22:35:00	2022-09-25 22:35:10	48000
sample_220925_223510.wav	2022-09-25 22:35:10	2022-09-25 22:35:20	48000
sample_220925_223520.wav	2022-09-25 22:35:20	2022-09-25 22:35:30	48000
sample_220925_223530.wav	2022-09-25 22:35:30	2022-09-25 22:35:40	48000
sample_220925_223600.wav	2022-09-25 22:36:00	2022-09-25 22:36:10	48000
sample_220925_223610.wav	2022-09-25 22:36:10	2022-09-25 22:36:20	48000
sample_220925_223620.wav	2022-09-25 22:36:20	2022-09-25 22:36:30	48000
sample_220925_223630.wav	2022-09-25 22:36:30	2022-09-25 22:36:40	48000
sample_220925_223640.wav	2022-09-25 22:36:40	2022-09-25 22:36:50	48000

To run analyses in the Public API, use the Analysis class:

from osekit.public_api.analysis import Analysis, AnalysisType
from osekit.utils.audio_utils import Normalization
from pandas import Timestamp, Timedelta

analysis = Analysis(
    analysis_type=AnalysisType.AUDIO,  # we just want to export the reshaped audio,
    begin=Timestamp("2022-09-25 22:35:15"),
    end=Timestamp("2022-09-25 22:36:25"),
    data_duration=Timedelta(seconds=5),
    overlap=0.25,
    sample_rate=24_000,
    normalization=Normalization.DC_REJECT,
    name="reshape_example",
)

The Core API can still be used on top of the Public API. Here, we filter out the empty AudioData with some Core API:

# Returns a Core API AudioDataset that matches the analysis
audio_dataset = dataset.get_analysis_audiodataset(analysis=analysis)

# Filter the returned AudioDataset
audio_dataset.data = [ad for ad in audio_dataset.data if not ad.is_empty]

	2025-12-16 14:38:15,850
Creating the audio data...

Running the analysis while specifying the filtered audio_dataset will skip the empty AudioData.

dataset.run_analysis(analysis=analysis, audio_dataset=audio_dataset)

	2025-12-16 14:38:15,861
Running analysis...

	2025-12-16 14:38:15,861
Writing audio files...

All the new files from the analysis are stored in an AudioDataset named after analysis.name:

pd.DataFrame(
    [
        {
            "Exported file": list(ad.files)[0].path.name,
            "Begin": ad.begin,
            "End": ad.end,
            "Sample Rate": ad.sample_rate,
        }
        for ad in dataset.get_dataset(analysis.name).data
    ],
).set_index("Exported file")

	Begin	End	Sample Rate
Exported file
2022_09_25_22_35_15_000000.wav	2022-09-25 22:35:15.000	2022-09-25 22:35:20.000	24000
2022_09_25_22_35_18_750000.wav	2022-09-25 22:35:18.750	2022-09-25 22:35:23.750	24000
2022_09_25_22_35_22_500000.wav	2022-09-25 22:35:22.500	2022-09-25 22:35:27.500	24000
2022_09_25_22_35_26_250000.wav	2022-09-25 22:35:26.250	2022-09-25 22:35:31.250	24000
2022_09_25_22_35_30_000000.wav	2022-09-25 22:35:30.000	2022-09-25 22:35:35.000	24000
2022_09_25_22_35_33_750000.wav	2022-09-25 22:35:33.750	2022-09-25 22:35:38.750	24000
2022_09_25_22_35_37_500000.wav	2022-09-25 22:35:37.500	2022-09-25 22:35:42.500	24000
2022_09_25_22_35_56_250000.wav	2022-09-25 22:35:56.250	2022-09-25 22:36:01.250	24000
2022_09_25_22_36_00_000000.wav	2022-09-25 22:36:00.000	2022-09-25 22:36:05.000	24000
2022_09_25_22_36_03_750000.wav	2022-09-25 22:36:03.750	2022-09-25 22:36:08.750	24000
2022_09_25_22_36_07_500000.wav	2022-09-25 22:36:07.500	2022-09-25 22:36:12.500	24000
2022_09_25_22_36_11_250000.wav	2022-09-25 22:36:11.250	2022-09-25 22:36:16.250	24000
2022_09_25_22_36_15_000000.wav	2022-09-25 22:36:15.000	2022-09-25 22:36:20.000	24000
2022_09_25_22_36_18_750000.wav	2022-09-25 22:36:18.750	2022-09-25 22:36:23.750	24000
2022_09_25_22_36_22_500000.wav	2022-09-25 22:36:22.500	2022-09-25 22:36:27.500	24000