Jump to Content

On Ensembles, I-Optimality, and Active Learning

William D Heavlin
Journal of Statistical Theory and Practice (2021)
Google Scholar


We consider the active learning problem for a supervised learning model: That is, after training a black box model on a given dataset, we determine which (large batch of) unlabeled candidates to label in order to improve the model further. We concentrate on the large-batch case, because this is most aligned with most machine learning applications, and because it is more theoretically rich. Our approach blends two key ideas: (1) We quantify model uncertainty with jackknife-like 50-percent sub-samples (“half-samples”). (2) To select which n of C candidates to label, we consider (a rank-(M −1) estimate of) the associated C × C prediction covariance matrix, which has good properties. We illustrate by fitting a deep neural networks to about 20 percent of the CIFAR-10 image dataset. The statistical efficiency we achieve is better than 3× random selection.