Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Shahar Segal; tal lancewicki; Tomer Koren; Yishay Mansour

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Shahar Segal

tal lancewicki

Tomer Koren

Yishay Mansour

ICML 2021 (2021) (to appear)

Google Scholar

Abstract

We study the stochastic Multi-Armed Bandit~(MAB) problem with
random delays in the feedback received by the algorithm. We consider two settings: the {\it reward-dependent} delay setting, where realized delays may depend on the stochastic rewards, and the {\it reward-independent} delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution.
Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for the case of infinite delays where the algorithm might occasionally not observe any feedback.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs