Mixture of Informed Experts for Multilingual Speech Recognition

Brian Farris
Pedro Jose Moreno Mengibar
Bhuvana Ramabhadran
Yun Zhu
ICASSP 2021, IEEE International Conference on Acoustics, Speech and Signal Processing (to appear)
Google Scholar


Multilingual speech recognition models are capable of recognizing speech in multiple different languages. When trained on related or low-resource languages, these models often outperform their monolingual counterparts. Similar to other forms of multi-task models, when the group of languages are unrelated, or when large amounts of training data is available, multilingual models can suffer from performance loss. We investigate the use of a mixture-of-expert approach to assign per-language parameters in the model to increase network capacity in a structured fashion. We introduce a novel variant of this approach, 'informed experts', which attempts to tackle inter-task conflicts by eliminating gradients from other tasks in the these task-specific parameters. We conduct experiments on a real-world task on English, French and four dialects of Arabic to show the effectiveness of our approach.