Google launches Cloud Text-to-Speech service powered by DeepMind’s AI for more natural sounding voices


Google WaveNet

Back in October 2017, DeepMind Artificial Intelligence startup became a part of the Alphabet, and its WaveNet technology which is a deep neural network for generating raw audio waveforms is used in producing better and more natural sounding speech for Google Assistant in English and Japanese languages.

Google today announced that it is bringing this technology to Google Cloud Platform with Cloud Text-to-Speech open for developers or business that needs voice synthesis on tap, whether that’s for an app, website, or virtual assistant.

The company says that it can be used in a variety of ways including power voice response systems for call centers (IVRs), enabling real-time natural language conversations, have IoT devices talk back to you, convert text into the spoken format like audiobooks, etc.

There are 32 basic voices to choose from, across 12 languages. Cloud Text-to-Speech correctly pronounces complex text such as names, dates, times and addresses for authentic-sounding and it also allows you to customize pitch, speaking rate, and volume gain and supports a variety of audio formats, including MP3 and WAV.

The improved WaveNet model generates audio 1,000 times faster than the original model, fidelity has also been increased to 24,000 samples per second, and the resolution has been bumped up from 8 to 16 bits all of which producing higher quality audio for a more human sound.

Cloud Text-to-Speech is currently being used by companies like Cisco and Dolphin ONE.  WaveNet voices also require less recorded audio input to produce high-quality models, and Google also mentioned that it would improve both the variety as well as the quality of the WaveNet voices available to Cloud customers in the coming months.

Source