TryOnDiffusion: A Tale of Two U-Nets

Luyang Zhu; Dawei Yang; Tyler Zhu; Fitsum Reda; William Chan; Chitwan Saharia; Mohammad Norouzi; Ira Kemelmacher-Shlizerman

TryOnDiffusion: A Tale of Two U-Nets

Luyang Zhu

Dawei Yang

Tyler Zhu

Fitsum Reda

William Chan

Chitwan Saharia

Mohammad Norouzi

Ira Kemelmacher-Shlizerman

The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023, IEEE, NA, pp. 1

Google Scholar

Abstract

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

TryOnDiffusion: A Tale of Two U-Nets

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs