Baidu’s text-to-speech system mimics a variety of accents

Baidu’s text-to-speech system mimics a variety of accents

PanARMENIAN.Net - Chinese tech giant Baidu's text-to-speech system, Deep Voice, is making a lot of progress toward sounding more human. The latest news about the tech are audio samples showcasing its ability to accurately portray differences in regional accents. The company says that the new version, aptly named Deep Voice 2, has been able to "learn from hundreds of unique voices from less than a half an hour of data per speaker, while achieving high audio quality." That's compared to the 20 hours of training it took to get similar results from the previous iteration, for a single voice, further pushing its efficiency past Google's WaveNet in a few months time, Engadget said.

Baidu says that unlike previous text-to-speech systems, Deep Voice 2 finds shared qualities between the training voices entirely on its own, and without any previous guidance. "Deep voice 2 can learn from hundreds of voices and imitate them perfectly," a blog post says.

In a research paper, Baidu concludes that its neural network can create voice pretty effectively even from small voice samples from hundreds of different speakers. All of which to say, it might not be long before we start hearing digital assistants that are more representative of the voices users encounter in their day-to-day lives, Engadget said.

 Top stories
Yerevan will host the 2024 edition of the World Congress On Information Technology (WCIT).
Rustam Badasyan said due to the lack of such regulation, the state budget is deprived of VAT revenues.
Krisp’s smart noise suppression tech silences ambient sounds and isolates your voice for calls.
Gurgen Khachatryan claimed that the "illegalities have been taking place in 2020."
Partner news
---