Batch Active Learning at Scale

Afshin Rostamizadeh; Anand Rajagopalan; Claudio Gentile; Giulia DeSalvo; Gui Citovsky; Laz Karydas; Sanjiv Kumar

Batch Active Learning at Scale

Afshin Rostamizadeh

Anand Rajagopalan

Claudio Gentile

Giulia DeSalvo

Gui Citovsky

Laz Karydas

Sanjiv Kumar

NeurIPS 2021

Google Scholar

Abstract

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Batch Active Learning at Scale

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs