Reshaping multiple files with the Public API

Reshaping multiple files with the Public API [1]#

As always in the Public API, the first step is to build the dataset:

from pathlib import Path

audio_folder = Path(r"_static/sample_audio")

from osekit.public_api.dataset import Dataset

dataset = Dataset(
    folder=audio_folder,
    strptime_format="%y%m%d_%H%M%S",
)

dataset.build()
	2025-08-27 10:39:19,993
Building the dataset...
	2025-08-27 10:39:19,994
Analyzing original audio files...
	2025-08-27 10:39:20,001
Organizing dataset folder...
	2025-08-27 10:39:20,004
Build done!

The Public API Dataset is now analyzed and organized:

print(f"{' DATASET ':#^60}")
print(f"{'Begin:':<30}{str(dataset.origin_dataset.begin):>30}")
print(f"{'End:':<30}{str(dataset.origin_dataset.end):>30}")
print(f"{'Sample rate:':<30}{str(dataset.origin_dataset.sample_rate):>30}\n")

print(f"{' ORIGINAL FILES ':#^60}")
import pandas as pd

pd.DataFrame(
    [
        {
            "Name": f.path.name,
            "Begin": f.begin,
            "End": f.end,
            "Sample Rate": f.sample_rate,
        }
        for f in dataset.origin_files
    ],
).set_index("Name")
######################### DATASET ##########################
Begin:                                   2022-09-25 22:34:50
End:                                     2022-09-25 22:36:50
Sample rate:                                           48000

###################### ORIGINAL FILES ######################
Begin End Sample Rate
Name
sample_220925_223450.wav 2022-09-25 22:34:50 2022-09-25 22:35:00 48000
sample_220925_223500.wav 2022-09-25 22:35:00 2022-09-25 22:35:10 48000
sample_220925_223510.wav 2022-09-25 22:35:10 2022-09-25 22:35:20 48000
sample_220925_223520.wav 2022-09-25 22:35:20 2022-09-25 22:35:30 48000
sample_220925_223530.wav 2022-09-25 22:35:30 2022-09-25 22:35:40 48000
sample_220925_223600.wav 2022-09-25 22:36:00 2022-09-25 22:36:10 48000
sample_220925_223610.wav 2022-09-25 22:36:10 2022-09-25 22:36:20 48000
sample_220925_223620.wav 2022-09-25 22:36:20 2022-09-25 22:36:30 48000
sample_220925_223630.wav 2022-09-25 22:36:30 2022-09-25 22:36:40 48000
sample_220925_223640.wav 2022-09-25 22:36:40 2022-09-25 22:36:50 48000

To run analyses in the Public API, use the Analysis class:

from osekit.public_api.analysis import Analysis, AnalysisType
from pandas import Timestamp, Timedelta

analysis = Analysis(
    analysis_type=AnalysisType.AUDIO,  # we just want to export the reshaped audio,
    begin=Timestamp("2022-09-25 22:35:15"),
    end=Timestamp("2022-09-25 22:36:25"),
    data_duration=Timedelta(seconds=5),
    name="reshape_example",
)

The Core API can still be used on top of the Public API. Here, we filter out the empty AudioData with some Core API:

# Returns a Core API AudioDataset that matches the analysis
audio_dataset = dataset.get_analysis_audiodataset(analysis=analysis)

# Filter the returned AudioDataset
audio_dataset.data = [ad for ad in audio_dataset.data if not ad.is_empty]
	2025-08-27 10:39:20,027
Creating the audio data...

Running the analysis while specifying the filtered audio_dataset will skip the empty AudioData.

dataset.run_analysis(analysis=analysis, audio_dataset=audio_dataset)
	2025-08-27 10:39:20,033
Running analysis...
	2025-08-27 10:39:20,034
Writing audio files...

All the new files from the analysis are stored in an AudioDataset named after analysis.name:

pd.DataFrame(
    [
        {
            "Exported file": list(ad.files)[0].path.name,
            "Begin": ad.begin,
            "End": ad.end,
            "Sample Rate": ad.sample_rate,
        }
        for ad in dataset.get_dataset(analysis.name).data
    ],
).set_index("Exported file")
Begin End Sample Rate
Exported file
2022_09_25_22_35_15_000000.wav 2022-09-25 22:35:15 2022-09-25 22:35:20 48000
2022_09_25_22_35_20_000000.wav 2022-09-25 22:35:20 2022-09-25 22:35:25 48000
2022_09_25_22_35_25_000000.wav 2022-09-25 22:35:25 2022-09-25 22:35:30 48000
2022_09_25_22_35_30_000000.wav 2022-09-25 22:35:30 2022-09-25 22:35:35 48000
2022_09_25_22_35_35_000000.wav 2022-09-25 22:35:35 2022-09-25 22:35:40 48000
2022_09_25_22_36_00_000000.wav 2022-09-25 22:36:00 2022-09-25 22:36:05 48000
2022_09_25_22_36_05_000000.wav 2022-09-25 22:36:05 2022-09-25 22:36:10 48000
2022_09_25_22_36_10_000000.wav 2022-09-25 22:36:10 2022-09-25 22:36:15 48000
2022_09_25_22_36_15_000000.wav 2022-09-25 22:36:15 2022-09-25 22:36:20 48000
2022_09_25_22_36_20_000000.wav 2022-09-25 22:36:20 2022-09-25 22:36:25 48000