Google Research

Avoiding Spurious Correlations: Bridging Theory and Practice

NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications

Abstract

Distribution shifts in the wild jeopardize the performance of machine learning models as they tend to pick up spurious correlations during training. Recent work \cite{nagarajan2020understanding} has characterized two specific failure modes of out-of-distribution (OOD) generalization, and we extend this theoretical framework by interpreting existing algorithms as solutions to these failure modes. We then evaluate them on different image classification datasets, and in the process surface two issues that are central to existing robustness techniques. For those that rely on group annotations, we show how the group information in standard benchmark datasets is unable to fully capture the spurious correlations present. For those that don't require group annotations, the validation set utilized for model selection still carries assumptions that are not realistic in real-world settings, and we show how this choice of shifts in validation set could impact performance of different OOD algorithms.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work