Randomized Exploration in Generalized Linear Bandits

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AIStats-20) (2020), pp. 2066-2076 (to appear)

Abstract

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their n-round regret, where d is the number of features and Kis the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and also apply them to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.