New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

Phil Long
Theoretical Computer Science, 808 (2020), pp. 159-163

Abstract

We study two generalizations of the
mistake bound model to online multiclass classification.
In the standard model, the learner receives the
correct classification at the end of each round, and
in the bandit model, the learner only finds out whether
its prediction was correct or not. For a set F of multiclass
classifiers, let opts(F) and optb(F) be the optimal bounds
for learning F according to these two models. We show that an
optb(F) < (1 + o(1)) (|Y| ln |Y|) opts(F) bound is the best possible
up to the leading constant, closing a Theta(log |Y|) factor gap.