Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech
Abstract
We present the multi-speaker text-to-speech corpora for Javanese and Sundanese
languages, the second and third biggest languages of Indonesia spoken by well
over a hundred million people. The key objectives were to collect the high-quality
data an affordable way and to share the data publicly with the speech
community. To achieve this, we collaborated with two local universities in Java and
streamlined our recording and crowdsourcing processes to produce the corpora
consisting of 5.8 thousand (Javanese) and 4.2 thousand (Sundanese) mixed-gender
recordings. We used these corpora to build several configurations of multi-speaker
neural network-based text-to-speech systems for Javanese and Sundanese. Subjective
evaluations performed on these configurations demonstrate that multilingual
configurations for which Javanese and Sundanese are trained jointly with a
larger Indonesian corpus significantly outperform the systems constructed
from a single language. We hope that sharing these corpora publicly and
presenting our multilingual approach to text-to-speech will help the community
to scale up the text-to-speech technologies to other lesser resourced languages
of Indonesia.
languages, the second and third biggest languages of Indonesia spoken by well
over a hundred million people. The key objectives were to collect the high-quality
data an affordable way and to share the data publicly with the speech
community. To achieve this, we collaborated with two local universities in Java and
streamlined our recording and crowdsourcing processes to produce the corpora
consisting of 5.8 thousand (Javanese) and 4.2 thousand (Sundanese) mixed-gender
recordings. We used these corpora to build several configurations of multi-speaker
neural network-based text-to-speech systems for Javanese and Sundanese. Subjective
evaluations performed on these configurations demonstrate that multilingual
configurations for which Javanese and Sundanese are trained jointly with a
larger Indonesian corpus significantly outperform the systems constructed
from a single language. We hope that sharing these corpora publicly and
presenting our multilingual approach to text-to-speech will help the community
to scale up the text-to-speech technologies to other lesser resourced languages
of Indonesia.