Google Research

Long Range Language Modeling via Gated State Spaces

Arxiv (2022)


State space models have shown to be effective for modeling long range dependencies, specifically on sequence classification tasks. In this paper we focus on autoregressive sequence modeling over natural language, Github code and ArXiv mathematics articles. Based on a few recent developments around effectiveness of gated activation functions, we propose a new layer, named Gated State Space (GSS) layer. We show that GSS trains significantly faster than the diagonal version of S4 (i.e. DSS) on TPUs, is simple to implement and fairly competitive with several well-tuned Transformer-based baselines. Finally, we show that interleaving traditional Transformer blocks with GSS improves performance even further.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work