'Can I do recognition from numpy array in python SpeechRecognition?

I'm recording a numpy array dt and then writing it in .wav by code like this:

dt = np.int16(dt/np.max(np.abs(dt)) * 32767)
scipy.io.wavfile.write("tmp.wav", samplerate, dt)

after that I read it and recognize by code

import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile("tmp.wav") as source:
    audio_text = r.listen(source)
    return r.recognize_google(audio_text, language = lang)

Can I do recognition from numpy array without using wav? Cuz it takes excess time



Solution 1:[1]

Assuming this is the module you are using, and according to its documentation, you can pass any file-like object to AudioFile(). File-like objects are objects that support read and write operations.

You should be able to stick the byte representation of the wav file into a io.BytesIO object, which supports these operations, and pass that into your speech recognition module. scipy.io.wavfile.write() supports writing to such file-like objects.

I don't have the package or any WAV files to test it, but let me know if something like this works:

wav_bytes = io.BytesIO()
scipy.io.wavfile.write(wav_bytes, samplerate, dt)
with sr.AudioFile(wav_bytes) as source:
    ...

Solution 2:[2]

You can create an audio data object first with AudioData, this is the source that the recognizer needs as a file-like object:

import io
from scipy.io.wavfile import write
import speech_recognition

byte_io = io.BytesIO(bytes())
write(byte_io, sr, audio_array)
result_bytes = byte_io.read()

audio_data = speech_recognition.AudioData(result_bytes, sr, 2)
r = speech_recognition.Recognizer()
text = r.recognize_google(audio_data)

audio_array is a 1-D numpy.ndarray with int16 values and sr is the sampling rate.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 anroesti
Solution 2 H_Barrio