Detecting Bias with Generative Counterfactual Face Attribute Augmentation
Abstract
We introduce a simple framework for identifying biases of a smiling attribute classifier. Our method poses counterfactual questions of the form: how would the prediction change if this face characteristic had been different? We leverage recent advances in generative adversarial networks to build a realistic generative model of faces that affords controlled manipulation of specific facial characteristics. Empirically, we identify several different factors of variation (that we believe should be in-dependent of a smiling) that affect the predictions of a smiling classifier trained on CelebA.