Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Saurabh Garg; Sivaraman Balakrishnan; Zachary Chase Lipton; Behnam Neyshabur; Hanie Sedghi

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Saurabh Garg

Sivaraman Balakrishnan

Zachary Chase Lipton

Behnam Neyshabur

Hanie Sedghi

ICLR (2022)

Download Google Scholar

Abstract

Distribution shift is a prevalent problem in the real-world deployment of machine learning models. Typically a mismatch between the source (training) and target (test) distribution leads to a gap between the source and target performance of the model. In this work, we investigate methods that leverage only unlabeled target data to predict accuracy under distribution shift. We propose a simple and effective method called Average Thresholded Confidence (ATC) that learns a scalar \emph{threshold} on model confidence on source data and predicts model performance as the average number of unlabeled target examples above the identified threshold. ATC outperforms previous approaches across several model architectures and various types of distribution shifts (e.g. synthetic corruptions, shifts due to dataset reproduction, or shifts due to novel subpopulations) applied to FMoW-\textsc{wilds}, ImageNet, CIFAR, and MNIST datasets. ATC estimates target performance up to $2\text{--}3\times$ more accurately compared to recently proposed methods. Finally, we theoretically analyze our proposed method on a toy distribution shift model with varying degrees of spurious correlation.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs