Long Range Language Modeling via Gated State Spaces

Harsh Mehta

Ankit Gupta

Ashok Cutkosky

Behnam Neyshabur

Arxiv(2022)

Download Google Scholar

Abstract

State space models have shown to be effective for modeling long range dependencies, specifically on sequence classification tasks. In this paper we focus on autoregressive sequence modeling over natural language, Github code and ArXiv mathematics articles. Based on a few recent developments around effectiveness of gated activation functions, we propose a new layer, named Gated State Space (GSS) layer. We show that GSS trains significantly faster than the diagonal version of S4 (i.e. DSS) on TPUs, is simple to implement and fairly competitive with several well-tuned Transformer-based baselines. Finally, we show that interleaving traditional Transformer blocks with GSS improves performance even further.

Research Areas

Natural Language Processing
Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Long Range Language Modeling via Gated State Spaces

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Long Range Language Modeling via Gated State Spaces

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities