'Librosa failing to plot mfcc generated
I'm was being able to generate MFCC from system captured audio and plot it, but after some refactor and configuring Tensorflow with CUDA. I used Librosa to generated the mfcc, matplotlib.pyplot with librosa.display to plot the MFCC and sounddevice capturing sound from Stereo mix from windows. The current configuration can create and plot MFCC from sample .wav files but when using system captured sounds it's not able to plot it since its generating a 3D array instead of a 2D when running MFCC. Here is the code that generates and plots
N_MFCC = 40
N_MELS = 40
N_FFT = 512
HOP_LENGTH = 160
MIN_FREQ = 0
MAX_FREQ = None
def create_mfcc(record, sample_rate):
features = librosa.feature.mfcc(record, sample_rate, n_fft=N_FFT,n_mfcc=N_MFCC,
n_mels=N_MELS,hop_length=HOP_LENGTH,fmin=MIN_FREQ, fmax=MAX_FREQ, htk=False)
return features
def plot_and_save_mfcc(mfcc_data, file_name, sample_rate):
plt.figure(figsize=(10, 8))
plt.title('Current audio MFCC', fontsize=18)
plt.xlabel('Time [s]', fontsize=18)
librosa_display.specshow(mfcc_data, sr=sample_rate)
plt.savefig(file_name)
plt.cla()
This generates this stack trace
Traceback (most recent call last):
File "main.py", line 68, in <module>
main()
File "main.py", line 63, in main
start_listening_and_creating_mfcc()
File "main.py", line 48, in start_listening_and_creating_mfcc
plot_and_save_mfcc(mfcc_data, conf.DEFAULT_MFCC_IMAGE_NAME.format(image_count), conf.SAMPLE_RATE)
File "main.py", line 38, in plot_and_save_mfcc
librosa_display.specshow(mfcc_data, sr=sample_rate)
File anaconda3\lib\site-packages\librosa\util\decorators.py", line 88, in inner_f
return f(*args, **kwargs)
File anaconda3\lib\site-packages\librosa\display.py", line 879, in specshow
out = axes.pcolormesh(x_coords, y_coords, data, **kwargs)
File anaconda3\lib\site-packages\matplotlib\__init__.py", line 1361, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File anaconda3\lib\site-packages\matplotlib\axes\_axes.py", line 6183, in pcolormesh
X, Y, C, shading = self._pcolorargs('pcolormesh', *args,
File anaconda3\lib\site-packages\matplotlib\axes\_axes.py", line 5671, in _pcolorargs
nrows, ncols = C.shape
ValueError: too many values to unpack (expected 2)
I did try debug it and change mfcc configuration, but no success. Also did try to reconfigure my environment but this didn't help either.
EDIT: Here is the mfcc.Shapes for the System audio (48000, 40, 1) And for the .wav sample files (40, 122)
As mentioned I left a function out of the question but here is it and the function the is used to load and create mfcc for the .wav files
def create_mfcc_from_file(file_path):
(signal, sample_rate) = librosa.load(file_path)
librosa_features = create_mfcc(signal, sample_rate)
plot_and_save_mfcc(librosa_features, 'mfcc-librosa', sample_rate)
def start_listening_and_creating_mfcc():
image_count = 0
while True:
my_recording = record_window()
mfcc_data = create_mfcc(my_recording, conf.SAMPLE_RATE)
plot_and_save_mfcc(mfcc_data, conf.DEFAULT_MFCC_IMAGE_NAME.format(image_count), conf.SAMPLE_RATE)
wav.write(conf.DEFAULT_MFCC_IMAGE_NAME.format(image_count) + '.wav', conf.SAMPLE_RATE, my_recording)
image_count += 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
