Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech

Jaka Aris Eko Wibawa; Supheakmungkol Sarin; Chen Fang Li; Knot Pipatsrisawat; Keshan Sodimana; Oddur Kjartansson; Alexander Gutkin; Martin Jansche; Linne Ha

Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech

Jaka Aris Eko Wibawa

Supheakmungkol Sarin

Chen Fang Li

Knot Pipatsrisawat

Keshan Sodimana

Oddur Kjartansson

Alexander Gutkin

Martin Jansche

Linne Ha

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), 7-12 May 2018, Miyazaki, Japan, pp. 1610-1614

Download Google Scholar

Abstract

We present the multi-speaker text-to-speech corpora for Javanese and Sundanese
languages, the second and third biggest languages of Indonesia spoken by well
over a hundred million people. The key objectives were to collect the high-quality
data an affordable way and to share the data publicly with the speech
community. To achieve this, we collaborated with two local universities in Java and
streamlined our recording and crowdsourcing processes to produce the corpora
consisting of 5.8 thousand (Javanese) and 4.2 thousand (Sundanese) mixed-gender
recordings. We used these corpora to build several configurations of multi-speaker
neural network-based text-to-speech systems for Javanese and Sundanese. Subjective
evaluations performed on these configurations demonstrate that multilingual
configurations for which Javanese and Sundanese are trained jointly with a
larger Indonesian corpus significantly outperform the systems constructed
from a single language. We hope that sharing these corpora publicly and
presenting our multilingual approach to text-to-speech will help the community
to scale up the text-to-speech technologies to other lesser resourced languages
of Indonesia.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs