Google Research

Burmese Speech Corpus, Finite­-State Text Normalization and Pronunciation Grammars with an Application to Text-­to-­Speech

Proc. 12th Language Resources and Evaluation Conference (LREC 2020), European Language Resources Association (ELRA), 11--16 May, Marseille, France, pp. 6328-6339

Abstract

This paper introduces an open-­source crowd­-sourced multi­-speaker speech corpus along with the comprehensive set of finite-­state transducer (FST) grammars for performing text normalization for the Burmese (Myanmar) language. We also introduce the open­-source finite­-state grammars for performing grapheme­-to­-phoneme (G2P) conversion for Burmese. These three components are necessary (but not sufficient) for building a high­-quality text-­to-­speech (TTS) system for Burmese, a tonal Southeast Asian language from the Sino­-Tibetan family which presents several linguistic challenges. We describe the corpus acquisition process and provide the details of our finite state­based approach to Burmese text normalization and G2P. Our experiments involve building a multi­speaker TTS system based on long short term memory (LSTM) recurrent neural network (RNN) models, which were previously shown to perform well for other languages in a low­-resource setting. Our results indicate that the data and grammars that we are announcing are sufficient to build reasonably high­-quality models comparable to other systems. We hope these resources will facilitate speech and language research on the Burmese language, which is considered by many to be low­resource due to the limited availability of free linguistic data.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work