'Split audio files stochastically in python

Hello is there a way to make audio files split stochastically. So far i have managed to split the audio files into 10 second snippets i would appreciate any help?

from pydub import AudioSegment
from pydub.utils import make_chunks

from pydub import AudioSegment 
from pydub.utils import make_chunks 

myaudio = AudioSegment.from_file('C:/Users/XY/Desktop/input/HouseSample.wav') 
chunk_length_ms = 10000 # pydub calculates in millisec 
chunks = make_chunks(myaudio,chunk_length_ms) #Make chunks of one sec 
for i, chunk in enumerate(chunks): 
    chunk_name = '{0}.wav'.format(i) 
    print ('exporting', chunk_name) 
    chunk.export(chunk_name, format='wav') 


Solution 1:[1]

So, let's say you want several chunk size per file.
In the simplest form, you'll need two things:

  • a new for loop
  • an array with all the chunk size
from pydub import AudioSegment
from pydub.utils import make_chunks
from pydub import AudioSegment 
from pydub.utils import make_chunks 

myaudio = AudioSegment.from_file('C:/Users/XY/Desktop/input/HouseSample.wav') 
chunk_sizes = [10000] # pydub calculates in millisec 
for chunk_length_ms in chunk_sizes:
    chunks = make_chunks(myaudio,chunk_length_ms) #Make chunks of one sec 
    for i, chunk in enumerate(chunks): 
        chunk_name = '{0}.wav'.format(i) 
        print ('exporting', chunk_name) 
        chunk.export(chunk_name, format='wav') 

For now, this code will actually produce the same split as you already have.
To add multiple split, you can simply add more values to the chunk_sizes array, e.g. chunk_sizes = [10000, 5000] for 10 and 5 seconds splits.

If you want to add some randomness, you could rely on any pseudo-random generator like random or numpy.random.

A small example, with 5 different split between 10s and 5s:

import random

N_SPLIT = 5
chunk_sizes = []
for _ in range(N_SPLIT):
    chunk_sizes.append(random.randint(5000, 10000))

Beware, if you need this split to be consistent across your dataset, you'll need to use the same randomized chunk_sizes array for each file, so it might be useful to use a seed here (e.g. random.seed(42)).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1