Google Research

MaskSketch: Unpaired Structure-guided Masked Image Generation

CVPR 2023 (2023)


Recent conditional image generation methods produce images of remarkable diversity, fidelity and realism. However, the majority of these methods allow conditioning only on labels or text prompts, which limits their level of control over the generation result. In this paper, we introduce MaskSketch, a masked image generation method that allows spatial conditioning of the generation result, using a guiding sketch as an extra conditioning signal during sampling. MaskSketch utilizes a pre-trained masked image generator, requires no model training or paired supervision, and works with input sketches of different levels of abstraction. We propose a novel parallel sampling scheme that leverages the structural information encoded in the intermediate self-attention maps of a masked generative transformer, such as scene layout and object shape. Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure. Evaluated on standard benchmark datasets, MaskSketch outperforms state-of-the-art methods for sketch-to-image translation, as well as generic image-to-image translation approaches.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work