The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks
Abstract
How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given model?
In this work we study this question by contrasting social biases with \underline{non}-social biases that might not even be discernible to human eye. To do so, empirically we simulate various alternative constructions for a given benchmark based on innocuous modifications. (such as paraphrasing or random-sampling) that maintain the essence of their social bias.
On two well-known social bias benchmarks (Winogender(Rudinger et al, 2019) and BiasNLI(Dev et al 2020)) we observe that the choice of these shallow modifications have surprising effect in the resulting degree of bias across various models.
We hope these troubling observations motivates more robust measures of social biases.
In this work we study this question by contrasting social biases with \underline{non}-social biases that might not even be discernible to human eye. To do so, empirically we simulate various alternative constructions for a given benchmark based on innocuous modifications. (such as paraphrasing or random-sampling) that maintain the essence of their social bias.
On two well-known social bias benchmarks (Winogender(Rudinger et al, 2019) and BiasNLI(Dev et al 2020)) we observe that the choice of these shallow modifications have surprising effect in the resulting degree of bias across various models.
We hope these troubling observations motivates more robust measures of social biases.