Eliciting User Preferences for Personalized Multi-Objective Reinforcement Learning through Comparative Feedback

Han Shao

Lee Cohen

Avrim Blum

Yishay Mansour

Aadirupa Saha

Matthew R. Walter

NeurIPS 2023

Download Google Scholar

Abstract

In classic reinforcement Learning (RL) problems, policies are evaluated with respect to some reward function and all optimal policies obtain the same expected return. However, when considering real-world dynamic environments in which different users have different preferences, a policy that is optimal for one user might sub-optimal for another. In this work, we propose a multi-objective reinforcement learning framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our setup consists of a Markov Decision Process with a multi-objective reward function, in which each user corresponds to (unknown) personal preferences vector and their reward in each state-action is the inner product of their preference vector with the multi-objective reward at that state-action. Our goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and the user feedback is their preferred policy. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Eliciting User Preferences for Personalized Multi-Objective Reinforcement Learning through Comparative Feedback

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Eliciting User Preferences for Personalized Multi-Objective Reinforcement Learning through Comparative Feedback

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities