Thompson Sampling with a Mixture Prior

Joey Hong
Branislav Kveton
Mohammad Ghavamzadeh
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AI-Stats-22) (2022), pp. 7565-7586

Abstract

We consider posterior sampling in online decision-making problems where the uncertain environment is sampled from a mixture distribution. We incorporate this structure in a natural way by initializing a Thompson sampling algorithm with a mixture prior. We provide a novel, general outline for analyzing the regret of Thompson sampling with a mixture prior. We also use this to derive Bayes regret bounds in both a linear bandit and tabular MDP settings. The regret bounds depend on the confidence widths of each component of the mixture prior, and converge to solely identifying the correct component when confidence widths are small. Finally, we demonstrate the empirical effectiveness of our proposed algorithm in a synthetic and real-world bandit problem involving multi-task image classification.