'Why would we use Connectionist Temporal Classification(CTC) in speech recognition?
I am new to speech recognition. I've read some blogs about CTC. It tackles sequence problems where the timing is variable. One piece of speech signal may contain multiple words. The labeling at each time stamp would be a tedious task and require a lot of effort. So we used CTC to help us for end-to-end training.
I am wondering, why don't just record the just one speech word in one piece of signal, instead of multiple works in one piece of signal, and train the neural network with them? (then, we don't need to align the label with the signal)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
