Inferring Semantic Layout for Text-to-Image Synthesis

Seunghoon Hong; Dingdong Yang; Jongwook Choi; Honglak Lee

Inferring Semantic Layout for Text-to-Image Synthesis

Seunghoon Hong

Dingdong Yang

Jongwook Choi

Honglak Lee

CVPR (2018)

Download Google Scholar

Abstract

We propose a novel hierarchical approach for text-toimage synthesis by inferring semantic layout. Instead of learning a direct mapping from text to image, our algorithm decomposes the generation process into multiple steps, in which it first constructs a semantic layout from the text by the layout generator and converts the layout to an image by the image generator. The proposed layout generator progressively constructs a semantic layout in a coarse-to-fine
manner by generating object bounding boxes and refining each box by estimating object shapes inside the box. The image generator synthesizes an image conditioned on the inferred semantic layout, which provides a useful semantic structure of an image matching with the text description. Our model not only generates semantically more meaningful images, but also allows automatic annotation of generated images and user-controlled generation process by
modifying the generated scene layout. We demonstrate the capability of the proposed model on challenging MS-COCO dataset and show that the model can substantially improve the image quality, interpretability of output and semantic alignment to input text over existing approaches.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Inferring Semantic Layout for Text-to-Image Synthesis

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs