A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese

Keshan Sodimana

Knot Pipatsrisawat

Linne Ha

Martin Jansche

Oddur Kjartansson

Pasindu De Silva

Supheakmungkol Sarin

Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (2018), pp. 66-70

Download Google Scholar

Abstract

The availability of language resources is vital for the development of text-to-speech (TTS) systems. Thus, open source data and tools are highly beneficial for research communities, especially those focusing on low-resourced languages. In this paper, we present data sets for 6 low-resourced languages that we open sourced to the public. The data sets consist of audio files, pronunciation lexicons, and phonology definitions of Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese. These data sets are sufficient for building TTS voices in these languages. We also describe a recipe for building a new TTS voice using our data together with openly available resources and tools.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities