MLP-Mixer: An All-MLP Architecture for Vision

Ilya Tolstikhin

Neil Houlsby

Alexander Kolesnikov

Lucas Beyer

Xiaohua Zhai

Thomas Unterthiner

Jessica Yung

Andreas Steiner

Daniel Martin Keysers

Jakob Uszkoreit

Mario Lučić

Alexey Dosovitskiy

NeurIPS 2021 (poster)

Download Google Scholar

Abstract

Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks with comparable pre-training and inference cost. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

MLP-Mixer: An All-MLP Architecture for Vision

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

MLP-Mixer: An All-MLP Architecture for Vision

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities