'Get the pronounciation correctness of two audio files using MFCC and DTW

First of all, I really don't have an idea about what am I doing with this code. I simply want to compare two .wav files and check the pronunciation correctness. I have searched the internet and found out that this can be done using MFCC and DWT. I got a sample code and It is working fine. But I want to get the distance between the two sounds as a percentage. Can anyone help me with this, please? And How to read this result, 0.0 means original file and testing file, both are same. That means lower the number is better right?

import librosa
from dtw import dtw
from numpy.linalg import norm

y1, sr1 = librosa.load('original.wav')
y2, sr2 = librosa.load('testing_file.wav')

mfcc1 = librosa.feature.mfcc(y1, sr1)
mfcc2 = librosa.feature.mfcc(y2, sr2)

dist, cost, acc_cost, path = dtw(mfcc1.T, mfcc2.T, dist=lambda x, y: norm(x - y, ord=1))
print ('Normalized distance between the two sounds:', dist)
#Normalized distance between the two sounds: 52367.556983947754

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Get the pronounciation correctness of two audio files using MFCC and DTW

Sources

Related Questions