Jump to Content

Self-Supervised Reinforcement Learning for Recommender Systems

Xin Xin
Alexandros Karatzoglou
Ioannis Arapakis
Joemon Jose
Proceedings of the 43th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20) (2020)

Abstract

In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, rich user interaction modes such as clicks, purchase history etc. The current state of the art supervised approaches fail to model them appropriately affecting the performance. In this context, casting sequential recommendation task as a reinforcement learning is a promising direction. A major component of RL approaches is to model the reward through the interaction between an agent and the environment. This is challenging in recommendation setting due to the pure off-policy setting and lack of negative rewards (feedback). In building models for recommender systems, it would often be problematic to train a model in an on-line fashion (as required by many modern RL methods) due to the requirement to expose the users to irrelevant recommendations. As a result, off-line learning from logged implicit feedback is of vital importance. In this paper, we propose a self-supervised reinforcement learning approach for sequential recommendation tasks. Our approach has two components: one for supervised learning; and another for reinforcement learning. The layer trained with reinforcement learning acts as a regularizer to drive the supervised head focusing on specific rewards (e.g., helping the user in purchases) while the supervised head with cross-entropy loss provides negative gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely self-Supervised Q-learning (SQN) and self-Supervised Actor-Critic (SAC). We integrate four state-of-the-art generative recommendation models in our frameworks. Experimental results on two data sets demonstrate the effectiveness of our approach.