SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Eugene Ie; Vihan Jain; Jing Wang; Sanmit Narvekar; Ritesh Agarwal; Rui Wu; Heng-Tze Cheng; Tushar Chandra; Craig Boutilier

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Eugene Ie

Vihan Jain

Jing Wang

Sanmit Narvekar

Ritesh Agarwal

Rui Wu

Heng-Tze Cheng

Tushar Chandra

Craig Boutilier

Proceedings of the Twenty-eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macau, China (2019), pp. 2592-2599

Download Google Scholar

Abstract

Reinforcement learning (RL) methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs