Reinforcement Learning with Feedback Graphs

Christoph Dann

Yishay Mansour

Mehryar Mohri

Ayush Sekhari

Karthik Sridharan

NeurIPS2020 (2020)

Download Google Scholar

Abstract

We study reinforcement learning in tabular MDPs where the agent receives additional side observations per step in the form of several transition samples -- e.g. from data augmentation. We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can leverage side observations for more sample-efficient learning. We give a regret bound that predominantly depends on the size of the maximum acyclic subgraph of the feedback graph, in contrast with a polynomial dependency on the number of states and actions in the absence of side observations. Finally, we highlight challenges when leveraging a small dominating set of the feedback graph as compared to the well-studied bandit setting and propose a new algorithm that can use such a dominating set to learn a near-optimal policy faster.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reinforcement Learning with Feedback Graphs

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reinforcement Learning with Feedback Graphs

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities