Jump to Content

On Weight Interpolation of the Hybrid Autoregressive Transducer Model

Bhuvana Ramabhadran
Tongzhou Chen
Interspeech 2022, Interspeech 2022 (2022) (to appear)


This paper explores ways to improve a two-pass speech recognition system when the first-pass is hybrid autoregressive transducer model and the second-pass is a neural language model. The main focus is on the scores provided by each of these models, their quantitative analysis, how to improve them and the best way to integrate them with the objective of better recognition accuracy. Several analysis are presented to show the importance of the choice of the integration weights for combining the first-pass and the second-pass scores. A sequence level weight estimation model along with four training criteria are proposed which allow adaptive integration of the scores per acoustic sequence. The effectiveness of this algorithm is demonstrated by constructing and analyzing models on the Librispeech data set.