- Keshan Sodimana
- Knot Pipatsrisawat
- Linne Ha
- Martin Jansche
- Oddur Kjartansson
- Pasindu De Silva
- Supheakmungkol Sarin
Abstract
The availability of language resources is vital for the development of text-to-speech (TTS) systems. Thus, open source data and tools are highly beneficial for research communities, especially those focusing on low-resourced languages. In this paper, we present data sets for 6 low-resourced languages that we open sourced to the public. The data sets consist of audio files, pronunciation lexicons, and phonology definitions of Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese. These data sets are sufficient for building TTS voices in these languages. We also describe a recipe for building a new TTS voice using our data together with openly available resources and tools.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work