Google Research

Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali

Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (2018), pp. 259-263

Abstract

The Brahmic family of writing systems is an alpha-syllabary, in which a consonant letter without an explicit vowel marker can be ambiguous: it can either represent a consonant phoneme or a CV syllable with an inherent vowel ("schwa"). The schwa-deletion ambiguity must be resolved when converting from text to an accurate phonemic representation, particularly for text-to-speech synthesis. We situate the problem of Bengali schwa-deletion in the larger context of grapheme-to-phoneme conversion for Brahmic scripts and solve it using neural network classifiers with graphemic features that are independent of the script and the language. Classifier training is implemented using TensorFlow and related tools. We analyze the impact of both training data size and trained model size, as these represent real-life data collection and system deployment constraints. Our method achieves high accuracy for Bengali and is applicable to other languages written with Brahmic scripts.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work