Segment of One – Now it’s Real
“Segment of One” is where every customer in a database of millions can be treated in a different way. Although there’s been buzz about that since at least 1989, true […]
Max Mozgovoy: the End of Traditional UX Research? Jim Griffin
Marwa Kechaou: A Keen Eye for Computer Vision Jim Griffin
Iqbal Hossain: The UofAZ Knowledge Map Story Jim Griffin
Jeffrey Spyropoulos: Making Analytics Count at JCP Jim Griffin
Tapan Khopkar: A ‘MasterClass’ in Marketing Mix Jim Griffin
Aida Farahani: From 2D to 3D in Seconds Jim Griffin
Nikhil Patel: Inside Sally Beauty’s Data Strategy Jim Griffin
Victor Perrine: From Bananas to $Billions Jim Griffin
Amazon has introduced an amazing new model called BASE TTS (TTS = text-to-speech). These are the models that accept written text as an input, and then speak that text for us, which is what we use to create talking avatars and chatbots, among many other use cases.
BASE stands for Big Adaptive Streamable Emergent.
The top TTS models until now have been YourTTS, Bark and Tortoise-TTS. They’ve all been pushing speech synthesis closer and closer to human-like speech, so BASE from Amazon set out to beat them by training on more data than they did. It’s a billion-parameter model trained on 100,000 hours of audio data.
The video covers seven areas where text-to-speech is known to stumble sometimes. In ascending order of difficulty, those are:
The video then presents 8 audio samples created by BASE TTS, each of which illustrates BASE TTS attempting to perform one of those especially-difficult tasks described above.
The results are quite impressive. Give a listen and see what you think!
“Segment of One” is where every customer in a database of millions can be treated in a different way. Although there’s been buzz about that since at least 1989, true […]
Copyright AI Master Group 2023-24