Text Normalization for Bangla, Khmer, Nepali, Javanese, Sinhala, and Sundanese TTS Systems

Keshan Sodimana

Pasindu De Silva

Richard Sproat

A Theeraphol

Chen Fang Li

Alexander Gutkin

Supheakmungkol Sarin

Knot Pipatsrisawat

6th International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2018), International Speech Communication Association (ISCA), 29--31 August, Gurugram, India, pp. 147-151

Download Google Scholar

Abstract

Text normalization is the process of converting non-standard words (NSWs) such as numbers, abbreviations, and time expressions into standard words so that their pronunciations can be derived either through lexicon lookup or by utilizing a program to predict pronunciations from spellings. Text normalization is, thus, an important component of any Text-to-Speech (TTS) system. Without such component, the resulting voice, no matter how good the quality is, may sound unintelligent. Such a component is often built manually by translating language-specific knowledge into rules that can be utilized by TTS pipelines. In this paper, we describe an approach to develop a rule-based text normalization component for many low-resourced languages. We also describe our open source repository containing text normalization grammars for Bangla, Javanese, Khmer, Nepali, Sinhala, Sundanese and present a recipe for utilizing them in a TTS system.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Text Normalization for Bangla, Khmer, Nepali, Javanese, Sinhala, and Sundanese TTS Systems

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Text Normalization for Bangla, Khmer, Nepali, Javanese, Sinhala, and Sundanese TTS Systems

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities