One of the companies working on improvements in AI is IBM. They are developing the algorithms and models that recognize speech patterns and styles. Their text-to-speech (TTS) modeling has advanced so much that it can create a nearly indiscernible and infinitely adaptable human voice from just 5 minutes of talking.
The speech synthesizer can improve itself even after this, and after 10 or especially 20 minutes of "listening" to sample speech it can reproduce the voice in whatever text very naturally.
IBM says that the trick to the impressive performance is the modular architecture of the neural speech synthesis. This means that the system detects and trains each aspect of the voice independently.
This makes the result retain the original character of the voice.
But you don't have to believe us, here you can listen to the samples (5 min, 10 min, 20 min) for different voices. You can also make them speak text of your choice here.
Written by: Matti Vähäkainu @ 1 Oct 2019 12:15