- Nikil Selvam
- Sunipa Dev
- Daniel Khashabi
- Tushar Khot
- Kai-Wei Chang
Abstract
How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given model? In this work we study this question by contrasting social biases with \underline{non}-social biases that might not even be discernible to human eye. To do so, empirically we simulate various alternative constructions for a given benchmark based on innocuous modifications. (such as paraphrasing or random-sampling) that maintain the essence of their social bias. On two well-known social bias benchmarks (Winogender(Rudinger et al, 2019) and BiasNLI(Dev et al 2020)) we observe that the choice of these shallow modifications have surprising effect in the resulting degree of bias across various models. We hope these troubling observations motivates more robust measures of social biases.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work