Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Yossi Gandelsman
Michal Yarom
Yoav Itzhak Wald
Phillip Isola
Michal Irani
Proc. ICCV 2021
Google Scholar

Abstract

Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is the S-space of StyleGAN, which is known to generate semantically meaningful dimensions in the image. However, these will typically not correspond to classifier-specific attributes since standard GAN training is not dependent on the classifier. To overcome this, we propose training procedure for
a StyleGAN, which incorporates the classifier model. This results in an S-space that captures distinct attributes underlying classifier outputs. After training, the model can be used to visualize the effect of changing multiple attributes per image, thus providing an image-specific explanation. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be changed in different ways to change its classifier prediction.
Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are interpretable as measured in user-studies.