Follow-ups Matter: Improving Contextual Bandits via Post-serving Features

Chaoqi Wang; Ziyu Ye; Zhe Feng; Ashwinkumar Badanidiyuru Varadaraja; Haifeng Xu

Follow-ups Matter: Improving Contextual Bandits via Post-serving Features

Chaoqi Wang

Ziyu Ye

Zhe Feng

Ashwinkumar Badanidiyuru Varadaraja

Haifeng Xu

NeurIPS-23 (Spotlight) (2023)

Download Google Scholar

Abstract

In the realm of contextual bandit algorithms, the standard framework involves observing a context, selecting an arm, and then observing the reward. This approach, while functional, often falls short when dealing with the complexity of real-world scenarios where additional valuable contexts are revealed after arm-selection. We introduce a new algorithm, pLinUCB, designed to incorporate these post-serving contexts effectively, thereby achieving sublinear regret. Our key technical contribution is a robustified and generalized version of the well-known Elliptical Potential Lemma (EPL), which allows us to handle the noise in the context vectors, a crucial aspect in a practical setting. This generalized EPL is of independent interest as it has potential applications beyond the scope of this work. Through extensive empirical tests on both synthetic and real-world datasets, we demonstrate that our proposed algorithm outperforms the state-of-the-art, thus establishing its practical potential. Our work underscores the importance of post-serving contexts in the contextual bandit setting and lays the groundwork for further research in this field.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Follow-ups Matter: Improving Contextual Bandits via Post-serving Features

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs