Google Research

An Analysis of "Attention" in Sequence-to-Sequence Models

Interspeech 2017, ISCA (2017)

Abstract

In this paper, we conduct a detailed investigation of attention-based models for automatic speech recognition (ASR). First, we explore different types of attention, including online and full-sequence attention. Second, we explore different sub-word units to see how much of the end-to-end ASR process can reasonably be captured by an attention model. In experimental evaluations, we find that although attention is typically focussed over a small region of the acoustics during each step of next label prediction, full sequence attention outperforms “online” attention, although this gap can be significantly reduced by increasing the length of the segments over which attention is computed. Furthermore, we find that content-independent phonemes are a reasonable sub-word unit for attention models; when used in the second-pass to rescore N-best hypotheses these models provide over a 10% relative improvement in word error rate.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work