On Ensembles, I-Optimality, and Active Learning
Abstract
We consider the active learning problem for a supervised learning
model: That is, after training a black box model on a given dataset, we determine which (large batch of) unlabeled candidates to label in order to improve
the model further.
We concentrate on the large-batch case, because this is most aligned with
most machine learning applications, and because it is more theoretically rich.
Our approach blends two key ideas: (1) We quantify model uncertainty with
jackknife-like 50-percent sub-samples (“half-samples”). (2) To select which n of
C candidates to label, we consider (a rank-(M −1) estimate of) the associated
C × C prediction covariance matrix, which has good properties.
We illustrate by fitting a deep neural networks to about 20 percent of the
CIFAR-10 image dataset. The statistical efficiency we achieve is better than
3× random selection.
model: That is, after training a black box model on a given dataset, we determine which (large batch of) unlabeled candidates to label in order to improve
the model further.
We concentrate on the large-batch case, because this is most aligned with
most machine learning applications, and because it is more theoretically rich.
Our approach blends two key ideas: (1) We quantify model uncertainty with
jackknife-like 50-percent sub-samples (“half-samples”). (2) To select which n of
C candidates to label, we consider (a rank-(M −1) estimate of) the associated
C × C prediction covariance matrix, which has good properties.
We illustrate by fitting a deep neural networks to about 20 percent of the
CIFAR-10 image dataset. The statistical efficiency we achieve is better than
3× random selection.