Cold Posteriors and Aleatoric Uncertainty
Abstract
Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect). To help interpret this phenomenon, we argue that commonly used priors in Bayesian neural networks can significantly overestimate the aleatoric uncertainty in the labels on many classification datasets. This problem is particularly pronounced in academic benchmarks like MNIST or CIFAR, for which the quality of the labels is high. For the special case of Gaussian process regression, any positive temperature corresponds to a valid posterior under a modified prior, and tuning this temperature is directly analogous to empirical Bayes. On classification tasks, there is no direct equivalence between modifying the prior and tuning the temperature, however reducing the temperature can lead to models which better reflect our belief that one gains little information by relabeling existing examples in the training set. Therefore although cold posteriors do not always correspond to an exact inference procedure, we believe they may often better reflect our true prior beliefs.