Follow-ups Matter: Improving Contextual Bandits via Post-serving Features

Chaoqi Wang
Ziyu Ye
Ashwinkumar Badanidiyuru Varadaraja
Haifeng Xu
NeurIPS-23 (Spotlight) (2023)

Abstract

In the realm of contextual bandit algorithms, the standard framework involves observing a context, selecting an arm, and then observing the reward. This approach, while functional, often falls short when dealing with the complexity of real-world scenarios where additional valuable contexts are revealed after arm-selection. We introduce a new algorithm, pLinUCB, designed to incorporate these post-serving contexts effectively, thereby achieving sublinear regret. Our key technical contribution is a robustified and generalized version of the well-known Elliptical Potential Lemma (EPL), which allows us to handle the noise in the context vectors, a crucial aspect in a practical setting. This generalized EPL is of independent interest as it has potential applications beyond the scope of this work. Through extensive empirical tests on both synthetic and real-world datasets, we demonstrate that our proposed algorithm outperforms the state-of-the-art, thus establishing its practical potential. Our work underscores the importance of post-serving contexts in the contextual bandit setting and lays the groundwork for further research in this field.