Towards Acoustic Model Unification Across Dialects
Abstract
Research has shown that acoustic model performance typically decreases when evaluated on a dialectal variation of the same language that was not used during training. Similarly, models simultaneously trained on a group of dialects tend to under-perform when compared to dialect-specific models. In this paper, we report on our efforts towards building a unified acoustic model that can serve a multi-dialectal language. Two techniques are presented: Distillation and MTL. In Distillation, we use an ensemble of dialect-specific acoustic models and distill its knowledge in a single model. In MTL, we utilize MultiTask Learning to train a unified acoustic model that learns to distinguish dialects as a side task. We show that both techniques are superior to the naive model that is trained on all dialectal data, reducing word error rates by 4.2% and 0.6%, respectively. And, while achieving this improvement, neither technique degrades the performance of the dialect-specific models by more than 3.4%.