On Measurements of Bias and Fairness in NLP
Abstract
Recent studies show that Natural Language Processing (NLP) models propagate societal biases about protected attributes such as gender, race, and nationality.
While existing works propose bias evaluation and mitigation methods for various tasks, there remains a need to cohesively understand the biases and normative harms these measures capture and how different measures compare. To address this gap, this work presents a comprehensive survey of existing bias measures in NLP---both intrinsic measures of representations and extrinsic measures of downstream applications---and organizes them through associated NLP tasks, metrics, datasets, societal biases, and corresponding harms. This survey also organizes commonly used NLP fairness metrics into different categories to present advantages, disadvantages, and correlations with general fairness metrics common in machine learning.
While existing works propose bias evaluation and mitigation methods for various tasks, there remains a need to cohesively understand the biases and normative harms these measures capture and how different measures compare. To address this gap, this work presents a comprehensive survey of existing bias measures in NLP---both intrinsic measures of representations and extrinsic measures of downstream applications---and organizes them through associated NLP tasks, metrics, datasets, societal biases, and corresponding harms. This survey also organizes commonly used NLP fairness metrics into different categories to present advantages, disadvantages, and correlations with general fairness metrics common in machine learning.