LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction

Jeremiah Lev Milbauer; Annie Louis; Javad Hosseini; Alex Fabrikant; Don Metzler; Tal Schuster

LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction

Jeremiah Lev Milbauer

Annie Louis

Javad Hosseini

Alex Fabrikant

Don Metzler

Tal Schuster

ACL (2023)

Download Google Scholar

Abstract

Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segments is highly beneficial for many tasks, we hypothesize that this interaction can be delayed until later encoding stages. To this end, we introduce Layer-adjustable Interactions in Transformers (LAIT). Within LAIT, segmented inputs are first encoded independently, and then jointly. This partial two-tower architecture bridges the gap between a Dual Encoder's ability to pre-compute representations for segments and a fully self-attentive Transformer's capacity to model cross-segment attention. Also, LAIT can be introduced only when finetuning, effectively converting an existing pretrained Transformer into the hybrid of the two aforementioned architectures, and providing an intuitive control over the performance-efficiency tradeoff. Experimenting on a wide range of NLP tasks, we find LAIT to significantly improve efficiency while preserving accuracy.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs