Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Saurabh Garg
Sivaraman Balakrishnan
Zachary Chase Lipton
ICLR (2022)

Abstract

Distribution shift is a prevalent problem in the real-world deployment of machine learning models. Typically a mismatch between the source (training) and target (test) distribution leads to a gap between the source and target performance of the model. In this work, we investigate methods that leverage only unlabeled target data to predict accuracy under distribution shift. We propose a simple and effective method called Average Thresholded Confidence (ATC) that learns a scalar \emph{threshold} on model confidence on source data and predicts model performance as the average number of unlabeled target examples above the identified threshold. ATC outperforms previous approaches across several model architectures and various types of distribution shifts (e.g. synthetic corruptions, shifts due to dataset reproduction, or shifts due to novel subpopulations) applied to FMoW-\textsc{wilds}, ImageNet, CIFAR, and MNIST datasets. ATC estimates target performance up to $2\text{--}3\times$ more accurately compared to recently proposed methods. Finally, we theoretically analyze our proposed method on a toy distribution shift model with varying degrees of spurious correlation.

Research Areas