Date:28/12/17
A recent research paper published by Google shows a text-to-speech voice generating system called Tacotron 2, which has a near human accuracy of imitating a person's voice speaking from text, Quartz reports. The paper explains that the AI's speech from text is so human-like that it is indistinguishable from that of a person reading from text. The system comprises two deep neural networks, the first translates text into a spectrogram (pdf), which is essentially a visual representation of audio frequency. After this, the spectrogram is fed into WaveNet, a speech generation algorithm from Google DeepMind. WaveNet reads the visual to create corresponding audio elements.
Quartz's report includes a few audio samples where one text sentence is generated by Tacotron 2 and the other is of a human. Without knowing which is which, it is surprisingly hard to differentiate between the two voice samples. Notably, Tacotron 2 can also handle words that are hard to pronounce as well as names. It is also able to change the way it speaks and stress on certain words depending on the punctuation. As of now, the system has only been trained to imitate one voice, and Google will have to train it again for it to speak in a different voice.
The paper essentially means to show that Google's AI has achieved near life-like speech generation, which is more refined compared to the stilted speech of yesteryear. This technology can be immediately applied to Google's AI voice service, since WaveNet was deployed for Assistant earlier this year. This will likely make make Assistant even more natural, scarily human-like and accurate than ever before, bridging the gap even further in human-computer interaction.
Google’s voice generating AI system is closer than ever to imitating human voice
Artificial Intelligence has evolved in a rapid pace to the point that we are now seeing it casually running in the background of our daily connected devices like smartphones or smart speakers. The uses for AI are endless, but tech giants like Apple and Google are largely using it to perform certain actions on our devices so that we don't have to. Google, particularly, has been excelling in the AI sphere for a while, and its Assistant is proof of how far it has come. Not only does it perform most actions through voice recognition, but it also provides text feedback in a voice that is ever so close to sounding as natural as humans. Voice generation has come a long way from sounding stiff an unnatural to smooth and life-like, and a new report suggests Google is closer to achieving the latter.A recent research paper published by Google shows a text-to-speech voice generating system called Tacotron 2, which has a near human accuracy of imitating a person's voice speaking from text, Quartz reports. The paper explains that the AI's speech from text is so human-like that it is indistinguishable from that of a person reading from text. The system comprises two deep neural networks, the first translates text into a spectrogram (pdf), which is essentially a visual representation of audio frequency. After this, the spectrogram is fed into WaveNet, a speech generation algorithm from Google DeepMind. WaveNet reads the visual to create corresponding audio elements.
Quartz's report includes a few audio samples where one text sentence is generated by Tacotron 2 and the other is of a human. Without knowing which is which, it is surprisingly hard to differentiate between the two voice samples. Notably, Tacotron 2 can also handle words that are hard to pronounce as well as names. It is also able to change the way it speaks and stress on certain words depending on the punctuation. As of now, the system has only been trained to imitate one voice, and Google will have to train it again for it to speak in a different voice.
The paper essentially means to show that Google's AI has achieved near life-like speech generation, which is more refined compared to the stilted speech of yesteryear. This technology can be immediately applied to Google's AI voice service, since WaveNet was deployed for Assistant earlier this year. This will likely make make Assistant even more natural, scarily human-like and accurate than ever before, bridging the gap even further in human-computer interaction.
Views: 339
©ictnews.az. All rights reserved.Similar news
- Justin Timberlake takes stake in Facebook rival MySpace
- Wills and Kate to promote UK tech sector at Hollywood debate
- 35% of American Adults Own a Smartphone
- How does Azerbaijan use plastic cards?
- Imperial College London given £5.9m grant to research smart cities
- Search and Email Still the Most Popular Online Activities
- Nokia to ship Windows Phone in time for holiday sales
- Internet 'may be changing brains'
- Would-be iPhone buyers still face weeks-long waits
- Under pressure, China company scraps Steve Jobs doll
- Jobs was told anti-poaching idea "likely illegal"
- Angelic "Steve Jobs" loves Android in Taiwan TV ad
- Kinect for Windows gesture sensor launched by Microsoft
- Kindle-wielding Amazon dips toes into physical world
- Video game sales fall ahead of PlayStation Vita launch