Randomized Exploration in Generalized Linear Bandits

Branislav Kveton; Manzil Zaheer; Csaba Szepesvari; Lihong Li; Mohammad Ghavamzadeh; Craig Boutilier

Randomized Exploration in Generalized Linear Bandits

Branislav Kveton

Manzil Zaheer

Csaba Szepesvari

Lihong Li

Mohammad Ghavamzadeh

Craig Boutilier

23rd International Conference on Artificial Intelligence and Statistics (2020)

Download Google Scholar

Abstract

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Randomized Exploration in Generalized Linear Bandits

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs