Measuring and Mitigating Unintended Bias in Text Classification

Lucas Dixon; John Li; Jeffrey Sorensen; Nithum Thain; Lucy Vasserman

Measuring and Mitigating Unintended Bias in Text Classification

Lucas Dixon

John Li

Jeffrey Sorensen

Nithum Thain

Lucy Vasserman

AAAI/ACM Conference on AI, Ethics, and Society (2018)

Download Google Scholar

Abstract

We introduce and illustrate a new approach to measuring and
mitigating unintended bias in machine learning models. Our
definition of unintended bias is parameterized by a test set
and a subset of input features. We illustrate how this can
be used to evaluate text classifiers using a synthetic test set
and a public corpus of comments annotated for toxicity from
Wikipedia Talk pages. We also demonstrate how imbalances
in training data can lead to unintended bias in the resulting
models, and therefore potentially unfair applications. We use
a set of common demographic identity terms as the subset of
input features on which we measure bias. This technique permits
analysis in the common scenario where demographic information
on authors and readers is unavailable, so that bias
mitigation must focus on the content of the text itself. The
mitigation method we introduce is an unsupervised approach
based on balancing the training dataset. We demonstrate that
this approach reduces the unintended bias without compromising
overall model quality

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Measuring and Mitigating Unintended Bias in Text Classification

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs