Reward Shaping for User Satisfaction in a REINFORCE Recommender

Konstantina Christakopoulou

Can Xu

Sai Zhang

Sriraj Badam

Trevor Potter

Daniel Li

Hao Wan

Xinyang Yi

Elaine Le

Chris Berg

Eric Bencomo Dixon

Ed H. Chi

Minmin Chen

(2021)

Download Google Scholar

Abstract

How might we design Reinforcement Learning (RL)-based recommenders that
encourage aligning user trajectories with the underlying user satisfaction?
Three research questions are key: (1) measuring user satisfaction, (2)
combatting sparsity of satisfaction signals, and (3) adapting the training of
the recommender agent to maximize satisfaction. For measurement, it has been
found that surveys explicitly asking users to rate their experience with
consumed items can provide valuable orthogonal information to the
engagement/interaction data, acting as a proxy to the underlying user
satisfaction. For sparsity, i.e, only being able to observe how satisfied users
are with a tiny fraction of user-item interactions, imputation models can be
useful in predicting satisfaction level for all items users have consumed. For
learning satisfying recommender policies, we postulate that reward shaping in
RL recommender agents is powerful for driving satisfying user experiences.
Putting everything together, we propose to jointly learn a policy network and a
satisfaction imputation network: The role of the imputation network is to learn
which actions are satisfying to the user; while the policy network, built on
top of REINFORCE, decides which items to recommend, with the reward utilizing
the imputed satisfaction. We use both offline analysis and live experiments in
an industrial large-scale recommendation platform to demonstrate the promise of
our approach for satisfying user experiences.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reward Shaping for User Satisfaction in a REINFORCE Recommender

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reward Shaping for User Satisfaction in a REINFORCE Recommender

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities