Google Research

Randomized Exploration in Generalized Linear Bandits

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AIStats-20) (2020), pp. 2066-2076 (to appear)

Abstract

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their n-round regret, where d is the number of features and Kis the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and also apply them to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work