Understanding How Encoder-Decoder Architectures Attend

Vinay Ramasesh

Kyle Aitken

Yuan Cao

Niru Maheswaranathan

NeurIPS (2021)

Download Google Scholar

Abstract

Encoder-decoder networks with attention have proven to be a powerful way to solve
many sequence-to-sequence tasks. In these networks, attention aligns encoder and
decoder states and is often used for visualizing network behavior. However, the
mechanisms used by networks to generate appropriate attention matrices are still
mysterious. Moreover, how these mechanisms vary depending on the particular
architecture used for the encoder and decoder (recurrent, feed-forward, etc.) are also
not well understood. In this work, we investigate how encoder-decoder networks
solve different sequence-to-sequence tasks. We introduce a way of decomposing
hidden states over a sequence into temporal (independent of input) and inputdriven (independent of sequence position) components. This reveals how attention
matrices are formed: depending on the task requirements, networks rely more
heavily on either the temporal or input-driven components. These findings hold
across both recurrent and feed-forward architectures despite their differences in
forming the temporal components. Overall, our results provide new insight into the
inner workings of attention-based encoder-decoder networks.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Understanding How Encoder-Decoder Architectures Attend

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs