Contextual Blocking Bandits

Constantine Caramanis

Orestis Papadigenopoulas

Sanjay Shakkottai

Soumya Basu

International Conference on Artificial Intelligence and Statistics, PMLR (2021)

Download Google Scholar

Abstract

We study a novel variant of the multi-armed bandit problem, where at each time step, the
player observes an independently sampled context that determines the arms’ mean rewards.
However, playing an arm blocks it (across all contexts) for a fixed and known number of future time
steps. The above contextual setting, which captures important scenarios such as recommendation
systems or ad placement with diverse users, invalidates greedy solution techniques that are
effective for its non-contextual counterpart (Basu et al., NeurIPS19). Assuming knowledge of the
context distribution and the mean reward of each arm-context pair, we cast the problem as an
online bipartite matching problem, where the right-vertices (contexts) arrive stochastically and
the left-vertices (arms) are blocked for a finite number of rounds each time they are matched.
This problem has been recently studied in the full-information case, where competitive ratio
bounds have been derived. We focus on the bandit setting, where the reward distributions
are initially unknown; we propose a UCB-based variant of the full-information algorithm that
guarantees a O(log T)-regret w.r.t. an α-optimal strategy in T time steps, matching the Ω(log(T))
regret lower bound in this setting. Due to the time correlations caused by blocking, existing
techniques for upper bounding regret fail. For proving our regret bounds, we introduce the novel
concepts of delayed exploitation and opportunistic sub-sampling and combine them with ideas
from combinatorial bandits and non-stationary Markov chains coupling.

Research Areas

Algorithms and Theory

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Contextual Blocking Bandits

Abstract

Research Areas

Meet the teams driving innovation