Path Consistency Learning in Tsallis Entropy Regularized MDPs

Mohammad Ghavamzadeh

Ofir Nachum

Yinlam Chow

ICML(2018)

Download Google Scholar

Abstract

We study the sparse entropy-regularized RL (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The opti mal policy of this formulation is sparse, i.e., at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of standard (soft) ERL, namely having softmax optimal policy. The problem with a soft max policy is that at every state, it may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. Lee et al. (2018) studied the properties of the sparse ERL problem and proposed value-based algorithms to solve it. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called sparse PCL, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a consistency equation for sparse ERL, called sparse consistency. We then prove that sparse consistency only implies sub-optimality (unlike the soft consistency in soft ERL). We then use the sparse consistency to derive our sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with large number of actions.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Path Consistency Learning in Tsallis Entropy Regularized MDPs

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Path Consistency Learning in Tsallis Entropy Regularized MDPs

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities