A Unified Phonological Representation of South Asian Languages for  Multilingual Text-to-Speech

Isin Demirsahin; Martin Jansche; Alexander Gutkin

A Unified Phonological Representation of South Asian Languages for Multilingual Text-to-Speech

Isin Demirsahin

Martin Jansche

Alexander Gutkin

Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), International Speech Communication Association (ISCA), 29--31 August, Gurugram, India (2018), pp. 80-84

Download Google Scholar

Abstract

We present a multilingual phoneme inventory and inclusion mappings from the native inventories of several major South Asian languages for multilingual parametric text-to-speech synthesis (TTS). Our goal is to reduce the need for training data when building new TTS voices by leveraging available data
for similar languages within a common feature design. For West Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Telugu, and Urdu we compare TTS voices trained only on monolingual data with voices trained on multilingual data from 12 languages. In subjective evaluations multilingually trained voices outperform (or in a few cases are statistically tied with) the corresponding monolingual voices. The multilingual setup can further be used to synthesize speech for languages not seen in the training data; preliminary evaluations lean towards good. Our results indicate that pooling data from different languages in a single acoustic model can be beneficial, opening up new uses and research questions.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

A Unified Phonological Representation of South Asian Languages for Multilingual Text-to-Speech

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs