Cross-lingual Phoneme Mapping for Language Robust Contextual Speech Recognition
Abstract
Usage of foreign entities in automatic speech recognition (ASR) systems is prevalent in various applications, yet correctly recognizing these foreign words while preserving the accuracy on native words still remains a challenge. We describe a novel approach for recognizing foreign words by injecting them with correctly mapped pronunciations into the recognizer decoder search space on-the-fly. The phoneme mapping between languages is learned automatically using acoustic coupling of Text-to-speech (TTS) audio and a pronunciation learning algorithm. The mapping allows us to utilize the pronunciation dictionary in a foreign language by mapping the pronunciations to the target recognizer language's phoneme inventory. Evaluation of our algorithm on Google Assistant use cases shows we can recognize English media songs with high accuracy on French and German recognizers without hurting recognition on general traffic.