Bandits with Adversarial Scaling

Thodoris Lykouris
ICML'20 (2020)
Google Scholar

Abstract

We study an intermediary model between stochastic and adversarial bandits, which
we term adversarial scaling, where the rewards are a product between a stochastic
component and an adversarial component shared by all arms. This can be viewed as
a model where the ratios between the mean rewards remain fixed, but the magniture
of rewards is rescaled by an adaptive adversary. We obtain logarithmic regret
bounds for this setting. As a by-product we improve the regret of purely stochastic
bandits in the special case where all the means are small. Finally we show that
classical algorithms such as UCB and Thompson Sampling are susceptible to
adversarial scaling attacks.

Research Areas