Hybrid Autoregressive Transducer (HAT)
Abstract
This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoder-decoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches.