New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

Phil Long
Theoretical Computer Science, 808(2020), pp. 159-163


We study two generalizations of the mistake bound model to online multiclass classification. In the standard model, the learner receives the correct classification at the end of each round, and in the bandit model, the learner only finds out whether its prediction was correct or not. For a set F of multiclass classifiers, let opts(F) and optb(F) be the optimal bounds for learning F according to these two models. We show that an optb(F) < (1 + o(1)) (|Y| ln |Y|) opts(F) bound is the best possible up to the leading constant, closing a Theta(log |Y|) factor gap.