Abstract
We give an $(\eps,\delta)$-differentially private algorithm for the Multi-Armed Bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a:\Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\eps}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\eps}\right)$, where $T$ is the number of rounds, $\Delta_a$ is the suboptimality gap of the action $a$, and $k$ is the total number of actions.
Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work