Character-Level Language Modeling with Deeper Self-Attention

Rami Al-Rfou; DK Choe; Noah Constant; Mandy Guo; Llion Jones

Character-Level Language Modeling with Deeper Self-Attention

Rami Al-Rfou

DK Choe

Noah Constant

Mandy Guo

Llion Jones

Thirty-Third AAAI Conference on Artificial Intelligence (2019)

Download Google Scholar

Abstract

LSTMs and other RNN variants have shown strong performance on character-level language modeling.
These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts.
In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving 1.13 bits per character on text8.
To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Character-Level Language Modeling with Deeper Self-Attention

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs