'Conversion between librosa.load() & pydub.AudioSegment.raw_data

I'm slightly stuck as I can't work out how to convert from np.ndarray provided by pydub.AudioSegment.raw_data to the data provided from librosa.load().

I'm executing FFT on the data and for some reason it's not accurate with pydub.AudioSegment. I need to convert the data to the same type as librosa.load() provides but can't work out the difference.

Here's by code:

import librosa
import numpy as np
import pydub

if __name__ == '__main__':
    filename = "data/E.wav"

    data, sr = librosa.load(filename)
    print("Librosa data: {}".format(data))

    aseg = pydub.AudioSegment.from_file(filename)
    aseg_data = np.frombuffer(aseg.raw_data, dtype=np.float64)
    print("AudioSegment data: {}".format(aseg_data))

And my output is:

Librosa data: [-0.00208831 -0.00306132 -0.00268072 ...  0.00057438  0.00082464 0.00097628]

AudioSegment data: [-7.37285249e+306 -7.37286320e+306 -7.02174591e+306 ...  3.19859460e-3084.45027928e-308  4.03300579e-308]

Link to audio I'm using - link



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source