Google Research

Bandits with Adversarial Scaling

ICML'20 (2020)


We study an intermediary model between stochastic and adversarial bandits, which we term adversarial scaling, where the rewards are a product between a stochastic component and an adversarial component shared by all arms. This can be viewed as a model where the ratios between the mean rewards remain fixed, but the magniture of rewards is rescaled by an adaptive adversary. We obtain logarithmic regret bounds for this setting. As a by-product we improve the regret of purely stochastic bandits in the special case where all the means are small. Finally we show that classical algorithms such as UCB and Thompson Sampling are susceptible to adversarial scaling attacks.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work