Jump to Content

Values of Exploration in Recommender Systems

Can Xu
Elaine Le
Mohit Sharma
Su-Lin Wu
Yuyan Wang
RecSys (2021)
Google Scholar


Reinforcement Learning (RL) has been sought after to bring next-generation recommender systems to improve user experience on recommendation platforms. While the exploration-exploitation tradeoff is the foundation of RL research, the value of exploration in RL based recommender systems is less well understood. Exploration, commonly seen as a tool to reduce model uncertainty in regions with sparse user interaction/feedback, is believed to cost user experience in the short term while the indirect benefit of better model quality arrives at a later time. We on the other hand argue that recommender systems have inherent needs for exploration and exploration can improve user experience even in the more imminent term. We focus on understanding the role of exploration in changing different facets of recommendation quality that more directly impact user experience. To do that, we introduce a series of methods inspired by exploration research to increase exploration in a RL based recommender system, and study their effect on the end recommendation quality, more specifically, \emph{accuracy, diversity, novelty and serendipity}. We propose a set of metrics to measure RL based recommender systems in these four aspects and evaluate the impact of exploration induced methods against these metrics. In addition to the offline measurements, we conduct live experiments on an industrial recommendation platform serving billions of users to showcase the benefit of exploration. Moreover, we use user conversion as an indicator of the holistic long-term user experience and study the values of exploration in helping platforms convert users. Connecting the offline analyses and live experiments, we start building the connections between these four facets of recommendation quality toward long term user experience and identify serendipity as a desirable recommendation quality that changes user states and improves long term user experience.

Research Areas