Bandits with Adversarial Scaling
We study an intermediary model between stochastic and adversarial bandits, which we term adversarial scaling, where the rewards are a product between a stochastic component and an adversarial component shared by all arms. This can be viewed as a model where the ratios between the mean rewards remain fixed, but the magniture of rewards is rescaled by an adaptive adversary. We obtain logarithmic regret bounds for this setting. As a by-product we improve the regret of purely stochastic bandits in the special case where all the means are small. Finally we show that classical algorithms such as UCB and Thompson Sampling are susceptible to adversarial scaling attacks.