Reshaping multiple files with the Public API

Reshaping multiple files with the Public API [1]#

As always in the Public API, the first step is to build the project:

from pathlib import Path

audio_folder = Path(r"_static/sample_audio/timestamped")

from osekit.public.project import Project

project = Project(
    folder=audio_folder,
    strptime_format="%y%m%d_%H%M%S",
)

project.build()
	2026-03-25 15:42:30,232
Building the project...
	2026-03-25 15:42:30,233
Analyzing original audio files...
	2026-03-25 15:42:30,243
Organizing project folder...
	2026-03-25 15:42:30,247
Build done!

The Public API Project is now analyzed and organized:

print(f"{' DATASET ':#^60}")
print(f"{'Begin:':<30}{str(project.origin_dataset.begin):>30}")
print(f"{'End:':<30}{str(project.origin_dataset.end):>30}")
print(f"{'Sample rate:':<30}{str(project.origin_dataset.sample_rate):>30}\n")

print(f"{' ORIGINAL FILES ':#^60}")
import pandas as pd

pd.DataFrame(
    [
        {
            "Name": f.path.name,
            "Begin": f.begin,
            "End": f.end,
            "Sample Rate": f.sample_rate,
        }
        for f in project.origin_files
    ],
).set_index("Name")
######################### DATASET ##########################
Begin:                                   2022-09-25 22:34:50
End:                                     2022-09-25 22:36:50
Sample rate:                                           48000

###################### ORIGINAL FILES ######################
Begin End Sample Rate
Name
sample_220925_223450.wav 2022-09-25 22:34:50 2022-09-25 22:35:00 48000
sample_220925_223500.wav 2022-09-25 22:35:00 2022-09-25 22:35:10 48000
sample_220925_223510.wav 2022-09-25 22:35:10 2022-09-25 22:35:20 48000
sample_220925_223520.wav 2022-09-25 22:35:20 2022-09-25 22:35:30 48000
sample_220925_223530.wav 2022-09-25 22:35:30 2022-09-25 22:35:40 48000
sample_220925_223600.wav 2022-09-25 22:36:00 2022-09-25 22:36:10 48000
sample_220925_223610.wav 2022-09-25 22:36:10 2022-09-25 22:36:20 48000
sample_220925_223620.wav 2022-09-25 22:36:20 2022-09-25 22:36:30 48000
sample_220925_223630.wav 2022-09-25 22:36:30 2022-09-25 22:36:40 48000
sample_220925_223640.wav 2022-09-25 22:36:40 2022-09-25 22:36:50 48000

To run transforms in the Public API, use the Transform class:

from osekit.public.transform import Transform, OutputType
from osekit.utils.audio import Normalization
from pandas import Timestamp, Timedelta

transform = Transform(
    output_type=OutputType.AUDIO,  # we just want to export the reshaped audio,
    begin=Timestamp("2022-09-25 22:35:15"),
    end=Timestamp("2022-09-25 22:36:25"),
    data_duration=Timedelta(seconds=5),
    overlap=0.25,
    sample_rate=24_000,
    normalization=Normalization.DC_REJECT,
    name="reshape_example",
)

The Core API can still be used on top of the Public API. Here, we filter out the empty AudioData with some Core API:

# Returns a Core API AudioDataset that matches the transform
audio_dataset = project.prepare_audio(transform=transform)

# Filter the returned AudioDataset
removed_data = audio_dataset.remove_empty_data(threshold=0.0)

# We can take a look at which data has been removed:
print(f"{' REMOVED DATA ':#^60}")
print(f"{'Begin':<20}{'Duration':^20}{'Fill rate':>20}")
for data in removed_data:
    print(
        f"{data.begin.strftime('%H:%M:%S'):<20}{str(data.duration):^20}{str(data.populated_ratio) + ' %':>20}"
    )
	2026-03-25 15:42:30,277
Creating the audio data...
####################### REMOVED DATA #######################
Begin                     Duration                 Fill rate
22:35:41              0 days 00:00:05                  0.0 %
22:35:45              0 days 00:00:05                  0.0 %
22:35:48              0 days 00:00:05                  0.0 %
22:35:52              0 days 00:00:05                  0.0 %

Running the transform while specifying the filtered audio_dataset will skip the empty AudioData.

project.run(transform=transform, audio_dataset=audio_dataset)
	2026-03-25 15:42:30,290
Running transform...
	2026-03-25 15:42:30,291
Writing audio files...

All the new files from the transform are stored in an AudioDataset named after transform.name:

pd.DataFrame(
    [
        {
            "Exported file": list(ad.files)[0].path.name,
            "Begin": ad.begin,
            "End": ad.end,
            "Sample Rate": ad.sample_rate,
        }
        for ad in project.get_output(transform.name).data
    ],
).set_index("Exported file")
Begin End Sample Rate
Exported file
2022_09_25_22_35_15_000000.wav 2022-09-25 22:35:15.000 2022-09-25 22:35:20.000 24000
2022_09_25_22_35_18_750000.wav 2022-09-25 22:35:18.750 2022-09-25 22:35:23.750 24000
2022_09_25_22_35_22_500000.wav 2022-09-25 22:35:22.500 2022-09-25 22:35:27.500 24000
2022_09_25_22_35_26_250000.wav 2022-09-25 22:35:26.250 2022-09-25 22:35:31.250 24000
2022_09_25_22_35_30_000000.wav 2022-09-25 22:35:30.000 2022-09-25 22:35:35.000 24000
2022_09_25_22_35_33_750000.wav 2022-09-25 22:35:33.750 2022-09-25 22:35:38.750 24000
2022_09_25_22_35_37_500000.wav 2022-09-25 22:35:37.500 2022-09-25 22:35:42.500 24000
2022_09_25_22_35_56_250000.wav 2022-09-25 22:35:56.250 2022-09-25 22:36:01.250 24000
2022_09_25_22_36_00_000000.wav 2022-09-25 22:36:00.000 2022-09-25 22:36:05.000 24000
2022_09_25_22_36_03_750000.wav 2022-09-25 22:36:03.750 2022-09-25 22:36:08.750 24000
2022_09_25_22_36_07_500000.wav 2022-09-25 22:36:07.500 2022-09-25 22:36:12.500 24000
2022_09_25_22_36_11_250000.wav 2022-09-25 22:36:11.250 2022-09-25 22:36:16.250 24000
2022_09_25_22_36_15_000000.wav 2022-09-25 22:36:15.000 2022-09-25 22:36:20.000 24000
2022_09_25_22_36_18_750000.wav 2022-09-25 22:36:18.750 2022-09-25 22:36:23.750 24000
2022_09_25_22_36_22_500000.wav 2022-09-25 22:36:22.500 2022-09-25 22:36:27.500 24000
# Reset the project to get all files back to place.
project.reset()