Counterfactual Fairness in Text Classification through Robustness

Sahaj Garg; Vincent Perot; Nicole Limtiaco; Ankur Taly; Ed H. Chi; Alex Beutel

Counterfactual Fairness in Text Classification through Robustness

Sahaj Garg

Vincent Perot

Nicole Limtiaco

Ankur Taly

Ed H. Chi

Alex Beutel

AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) (2019)

Download Google Scholar

Abstract

In this paper, we study counterfactual fairness in text classification, which asks the question: How would the prediction change if the sensitive attribute referenced in the example were different? Toxicity classifiers demonstrate a counterfactual fairness issue by predicting that "Some people are gay'' is toxic while "Some people are straight'' is nontoxic. We offer a metric, counterfactual token fairness (CTF), for measuring this particular form of fairness in text classifiers, and describe its relationship with group fairness. Further, we offer three approaches, blindness, counterfactual augmentation, and counterfactual logit pairing (CLP), for optimizing counterfactual token fairness during training, bridging the robustness and fairness literature. Empirically, we find that blindness and CLP address counterfactual token fairness. The methods do not harm classifier performance, and have varying tradeoffs with group fairness. These approaches, both for measurement and optimization, provide a new path forward for addressing fairness concerns in text classification.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Counterfactual Fairness in Text Classification through Robustness

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs