Google Research

DreamPose: Fashion Video Synthesis with Stable Diffusion

ICCV (2023)


We present DreamPose, a diffusion model-based method to generate fashion videos from still images. Given an image and pose sequence, our method realistically animates both human and fabric motions as a function of body poses. Unlike past image-to-video approaches, we transform a pretrained text-to-image (T2I) stable diffusion model into an pose-guided video synthesis model, achieving high-quality results at a fraction of the computational cost of traditional video diffusion methods [13]. In our approach, we introduce a novel encoder architecture that enables Stable Diffusion to be conditioned directly on image embeddings, eliminating the need for intermediate text embeddings of any kind. We additionally demonstrate that concatenating target poses with the input noise is a simple yet effective means to condition the output frame on poses. Our quantitative and qualitative results show that DreamPose achieves state-of-the-art results on fashion video synthesis.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work