Google Research

Self-Supervised Reinforcement Learning for Recommender Systems



In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, rich user interaction modes such as clicks, purchase history etc. The current state of the art supervised approaches fail to model them appropriately affecting the performance. In this context, casting sequential recommendation task as a reinforcement learning is a promising direction. A major component of RL approaches is to model the reward through the interaction between an agent and the environment. This is challenging in recommendation setting due to the pure off-policy setting and lack of negative rewards (feedback). In building models for recommender systems, it would often be problematic to train a model in an on-line fashion (as required by many modern RL methods) due to the requirement to expose the users to irrelevant recommendations. As a result, off-line learning from logged implicit feedback is of vital importance. In this paper, we propose a self-supervised reinforcement learning approach for sequential recommendation tasks. Our approach has two components: one for supervised learning; and another for reinforcement learning. The layer trained with reinforcement learning acts as a regularizer to drive the supervised head focusing on specific rewards (e.g., helping the user in purchases) while the supervised head with cross-entropy loss provides negative gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely self-Supervised Q-learning (SQN) and self-Supervised Actor-Critic (SAC). We integrate four state-of-the-art generative recommendation models in our frameworks. Experimental results on two data sets demonstrate the effectiveness of our approach.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work