- Joey Hong
- Branislav Kveton
- Manzil Zaheer
- Mohammad Ghavamzadeh
- Craig Boutilier
We consider posterior sampling in online decision-making problems where the uncertain environment is sampled from a mixture distribution. We incorporate this structure in a natural way by initializing a Thompson sampling algorithm with a mixture prior. We provide a novel, general outline for analyzing the regret of Thompson sampling with a mixture prior. We also use this to derive Bayes regret bounds in both a linear bandit and tabular MDP settings. The regret bounds depend on the confidence widths of each component of the mixture prior, and converge to solely identifying the correct component when confidence widths are small. Finally, we demonstrate the empirical effectiveness of our proposed algorithm in a synthetic and real-world bandit problem involving multi-task image classification.
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work