'How to get proper PCM stream from Microsoft.CognitiveServices.Speech in C#

I am trying to use Azure TTS with discord but I can't get the stream from Azure TTS to Discord I use Discord.Net (https://discordnet.dev/guides/voice/sending-voice.html)

public static async Task<MemoryStream> GetTTSStream(string text)
{
    var config = SpeechConfig.FromSubscription("", "");
    using SpeechSynthesizer synthesizer = new(config, null);

    SpeechSynthesisResult result = await synthesizer.SpeakTextAsync(text).ConfigureAwait(false);

    if (result.Reason == ResultReason.SynthesizingAudioCompleted)
    {
        var audioStream = AudioDataStream.FromResult(result);

        var buffer = result.AudioData;
        return new MemoryStream(buffer);
    }
    else if (result.Reason == ResultReason.Canceled)
    {
        var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);

        StringBuilder sb = new StringBuilder();
        sb.AppendLine($"CANCELED: Reason={cancellation.Reason}");
        sb.AppendLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
        sb.AppendLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");

        Logger.Warning(sb.ToString());
    }
    return null;
}


Solution 1:[1]

As suggested by ramr-msft | Microsoft Docs:

You can try Speech Synthesis sample to pull audio output stream.

// Speech synthesis to pull audio output stream.
public static async Task SynthesisToPullAudioOutputStreamAsync()
{
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

// Creates an audio out stream.
using (var stream = AudioOutputStream.CreatePullStream())
{
// Creates a speech synthesizer using audio stream output.
using (var streamConfig = AudioConfig.FromStreamOutput(stream))
using (var synthesizer = new SpeechSynthesizer(config, streamConfig))
{
while (true)
{
// Receives a text from console input and synthesize it to pull audio output stream.
Console.WriteLine("Enter some text that you want to synthesize, or enter empty text to exit.");
Console.Write("> ");
string text = Console.ReadLine();

if (string.IsNullOrEmpty(text))
{
break;
}

using (var result = await synthesizer.SpeakTextAsync(text))
{
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
Console.WriteLine($"Speech synthesized for text [{text}], and the audio was written to output stream.");
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
}
}
}
}

// Reads(pulls) data from the stream
byte[] buffer = new byte[32000];
uint filledSize = 0;
uint totalSize = 0;
while ((filledSize = stream.Read(buffer)) > 0)
{
Console.WriteLine($"{filledSize} bytes received.");
totalSize += filledSize;
}
Console.WriteLine($"Totally {totalSize} bytes received.");
}
}  

References: How to get PCM stream from Azure's SpeechSynthesizer - Microsoft Q&A and cognitive-services-speech-sdk/speech_synthesis_samples.cs at master · Azure-Samples/cognitive-services-speech-sdk · GitHub

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 MadhurajVadde-MT