Large-Scale Bandit Problems and KWIK Learning
Abstract
We show that parametric multi-armed bandit (MAB) problems with large state and action
spaces can be algorithmically reduced to the supervised learning model known as “Knows
What It Knows” or KWIK learning. We give matching impossibility results showing that the KWIK-learnability requirement cannot be replaced by weaker supervised learning assumptions. We provide such results in both the standard parametric MAB setting, as well as for a new model in which the action
space is finite but growing with time.