Burmese Speech Corpus, Finite­-State Text Normalization and Pronunciation Grammars with an Application to Text-­to-­Speech

Yin May Oo
Chen Fang Li
Pasindu De Silva
Supheakmungkol Sarin
Knot Pipatsrisawat
Martin Jansche
Proc. 12th Language Resources and Evaluation Conference (LREC 2020), European Language Resources Association (ELRA), 11--16 May, Marseille, France, pp. 6328-6339

Abstract

This paper introduces an open-­source crowd­-sourced multi­-speaker speech corpus along with the comprehensive set of finite-­state transducer (FST) grammars for performing text normalization for the Burmese (Myanmar) language. We also introduce the open­-source finite­-state grammars for performing grapheme­-to­-phoneme (G2P) conversion for Burmese. These three components are necessary (but not sufficient) for building a high­-quality text-­to-­speech (TTS) system for Burmese, a tonal Southeast Asian language from the Sino­-Tibetan family which presents several linguistic challenges. We describe the corpus acquisition process and provide the details of our finite state­based approach to Burmese text normalization and G2P. Our experiments involve building a multi­speaker TTS system based on long short term memory (LSTM) recurrent neural network (RNN) models, which were previously shown to perform well for other languages in a low­-resource setting. Our results indicate that the data and grammars that we are announcing are sufficient to build
reasonably high­-quality models comparable to other systems. We hope these resources will facilitate speech and language research on the Burmese language, which is considered by many to be low­resource due to the limited availability of free linguistic data.