Hybrid Autoregressive Transducer (HAT)

David Rybach
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, pp. 6139-6143

Abstract

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoder-decoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches.