Self-Distilled StyleGAN: Towards Generation from Internet Photos

Ron Mokady
Michal Yarom
Michal Irani
Proceedings of the 49th Annual Conference on Computer Graphics and Interactive Techniques (2022)
Google Scholar

Abstract

StyleGAN is known to produce high-fidelity images, while also offering unprecedented semantic editing. However, these fascinating abilities have been demonstrated only on a limited set of datasets, which are usually structurally aligned and well curated.
In this paper, we show how StyleGAN can be adapted to work on raw uncurated images collected from the Internet. Such image collections impose two main challenges to StyleGAN: they contain many outlier images, and are characterized by a multi-modal distribution. Training StyleGAN on such raw image collections results in degraded image synthesis quality. To meet these challenges, we proposed a StyleGAN-based self-distillation approach, which consists of two main components: (i) A generative-based self-filtering of the dataset to eliminate out-of-distribution images, in order to generate an adequate training set, and (ii) Perceptual clustering of the generated images to detect the inherent data modalities, which are then employed to improve StyleGAN’s “truncation trick” in the image synthesis process. The presented technique enables the generation of high-quality images, while better reserving the diversity of the data. Through qualitative and quantitative evaluation, we demonstrate the power of our approach to new challenging and diverse domains collected from the Internet. New datasets and pre-trained models will be published upon acceptance.

Research Areas