Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

Eugene Ie

Vihan Jain

Jing Wang

Sanmit Narvekar

Ritesh Agarwal

Rui Wu

Heng-Tze Cheng

Morgane Lustman

Vince Gatto

Paul Covington

Jim McFadden

Tushar Chandra

Craig Boutilier

arXiv(2019)

Download Google Scholar

Abstract

Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behavior. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. In this work, we address the challenge of making slate-based recommendations to optimize long-term value using RL. Our contributions are three-fold. (i) We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user-choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. (ii) We outline a methodology that leverages existing myopic learning-based recommenders to quickly develop a recommender that handles LTV. (iii) We demonstrate our methods in simulation, and validate the scalability of decomposed TD-learning using SlateQ in live experiments on YouTube.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities