'About the usage of vocoders

I'm quite new to AI and I'm currently developing a model for non-parallel voice conversions. One confusing problem that I have is the use of vocoders.

So my model needs Mel spectrograms as the input and the current model that I'm working on is using the MelGAN vocoder (Github link) which can generate 22050Hz Mel spectrograms from raw wav files (which is what I need) and back. I recently tried WaveGlow Vocoder (PyPI link) which can also generate Mel spectrograms from raw wav files and back.

But, in other models such as, WaveRNN , VocGAN , WaveGrad There's no clear explanation about wav to Mel spectrograms generation. Do most of these models don't require the wav to Mel spectrograms feature because they largely cater to TTS models like Tacotron? or is it possible that all of these have that feature and I'm just not aware of it?

A clarification would be highly appreciated.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'About the usage of vocoders

Sources

Related Questions