MAGE: Masked Generative Encoder to Unify Representation Learning and Image Synthesis

Tianhong Li; Huiwen Chang; Shlok Kumar Mishra; Han Zhang; Dina Katabi; Dilip Krishnan

MAGE: Masked Generative Encoder to Unify Representation Learning and Image Synthesis

Tianhong Li

Huiwen Chang

Shlok Kumar Mishra

Han Zhang

Dina Katabi

Dilip Krishnan

CVPR 2023 (to appear)

Download Google Scholar

Abstract

Generative modeling and representation learning are two key tasks in computer vision. However, these models are typically trained independently, which ignores the potential for each task to help the other, and leads to training and model maintenance overheads.

In this work, we propose Masked Generative Encoder (Mage), the first framework to unify high-quality image generation and SOTA self-supervised representation learning (nearly at par with supervised learning for ImageNet). Our key insight is that using variable mask ratios in masked image modeling pre-training can allow generative training (very high mask ratio) and representation learning (lower mask ratio) under the same training framework. Inspired by previous generative models, MAGE uses semantic tokens learnt by a vector-quantized GAN model at inputs and outputs, combining this with masking. We can further improve the representation by adding a contrastive loss to the encoder output. We extensively evaluate the generation and representation learning capabilities of MAGE.

On ImageNet-1K, MAGE obtains 9.10 FID in the task of class-unconditional image generation and 78.9% top-1 accuracy for linear probing, achieving state-of-the-art performance in both image generation and representation learning.

Research Areas

Machine perception

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

MAGE: Masked Generative Encoder to Unify Representation Learning and Image Synthesis

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs