Segment of One – Now it’s Real
“Segment of One” is where every customer in a database of millions can be treated in a different way. Although there’s been buzz about that since at least 1989, true […]
Amazon has introduced an amazing new model called BASE TTS (TTS = text-to-speech). These are the models that accept written text as an input, and then speak that text for us, which is what we use to create talking avatars and chatbots, among many other use cases.
BASE stands for Big Adaptive Streamable Emergent.
The top TTS models until now have been YourTTS, Bark and Tortoise-TTS. They’ve all been pushing speech synthesis closer and closer to human-like speech, so BASE from Amazon set out to beat them by training on more data than they did. It’s a billion-parameter model trained on 100,000 hours of audio data.
The video covers seven areas where text-to-speech is known to stumble sometimes. In ascending order of difficulty, those are:
The video then presents 8 audio samples created by BASE TTS, each of which illustrates BASE TTS attempting to perform one of those especially-difficult tasks described above.
The results are quite impressive. Give a listen and see what you think!
“Segment of One” is where every customer in a database of millions can be treated in a different way. Although there’s been buzz about that since at least 1989, true […]
Copyright AI Master Group.