Reinforcement Learning Can Be More Efficient with Multiple Rewards

Chris Dann

Yishay Mansour

Mehryar Mohri

ICML 2023

Download Google Scholar

Abstract

There is often a great degree of freedom in the reward design when formulating a task as a reinforcement learning (RL) problem. The choice of reward function has significant impact on the learned policy and how fast the algorithm converges to it. Typically several iterations of specifying and learning with the reward function are necessary to find one that leads to sample-efficient learning of desired behavior. In this work, we instead propose to directly pass multiple alternate reward formulations of the task to the RL agent. We show that natural extensions of action-elimination algorithms to multiple rewards achieve more favorable instance-dependent regret bounds than their single-reward counterparts, both in multi-armed bandits and in tabular Markov decision processes. Specifically our bounds scale for each state-action pair with the inverse of the most favorable gap among all reward functions. This suggests that learning with multiple rewards can indeed be more sample-efficient, as long as the rewards agree on an optimal policy. We further prove that when rewards do not agree on the optimal policy, multi-reward action elimination in multi-armed bandits still learns a policy that is good across all reward functions.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reinforcement Learning Can Be More Efficient with Multiple Rewards

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reinforcement Learning Can Be More Efficient with Multiple Rewards

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities