Large-Scale Bandit Problems and KWIK Learning

Jacob Abernethy; Kareem Amin; Moez Draief; Michael Kearns

Large-Scale Bandit Problems and KWIK Learning

Jacob Abernethy

Kareem Amin

Moez Draief

Michael Kearns

Proceedings of the 30th International Conference on Machine Learning (2013)

Google Scholar

Abstract

We show that parametric multi-armed bandit (MAB) problems with large state and action
spaces can be algorithmically reduced to the supervised learning model known as “Knows
What It Knows” or KWIK learning. We give matching impossibility results showing that the KWIK-learnability requirement cannot be replaced by weaker supervised learning assumptions. We provide such results in both the standard parametric MAB setting, as well as for a new model in which the action
space is finite but growing with time.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Large-Scale Bandit Problems and KWIK Learning

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs