'Get all transcript results using the google Speech-to-text API

I would like to know if it is possible to get all the possible transcripts that google can generate from a given audio file, as you can see it is only giving the transcript that has the higher matching result.

from google.cloud import speech
import os
import io

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = ''


# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"test2.wav")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb"    
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

print(response.results)

# Reads the response
for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))


Solution 1:[1]

On your RecognitionConfig(), set a value to max_alternatives. When this is set greater than 1, it will show the other possible transcriptions.

max_alternatives int

Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of SpeechRecognitionAlternative messages within each SpeechRecognitionResult. The server may return fewer than max_alternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of one. If omitted, will return a maximum of one.

Update your RecognitionConfig() to the code below:

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb",
    max_alternatives=10 # place a value between 0 - 30
)

I tested this using the sample audio from the github repo of Speech API. I used code below for testing:

from google.cloud import speech
import os
import io

# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"audio.raw")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    audio_channel_count=1,
    language_code="en-us",
    max_alternatives=10 # used 10 for testing
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

for result in response.results:
    print(result.alternatives)

Output:

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1