Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme prediction

Antoine Bruguier
Anton Bakhtin
Dravyansh Sharma
Interspeech, 2018(2018)
Google Scholar


Both automatic speech recognition and text to speech systems need accurate pronunciations, typically obtained by using both a lexicon dictionary and a grapheme to phoneme (G2P) model. G2Ps typically struggle with predicting pronunciations for tail words, and we hypothesized that one reason is because they try to discover general pronunciation rules without using prior knowledge of the pronunciation of related words. Our new approach expands a sequence-to-sequence G2P model by injecting prior knowledge. In addition, our model can be updated without having to retrain a system. We show that our new model has significantly better performance for German, both on a tightly controlled task and on our real-world system. Finally, the simplification of the system allows for faster and easier scaling to other languages.

Research Areas