Budgeted Reinforcement Learning in Continuous State Space

Nicolas Carrara; Edouard Leurent; Romain Laroche; Tanguy Urvoy; Odalric-Ambrym Maillard; Olivier Pietquin

Budgeted Reinforcement Learning in Continuous State Space

Nicolas Carrara

Edouard Leurent

Romain Laroche

Tanguy Urvoy

Odalric-Ambrym Maillard

Olivier Pietquin

NeurIPS (2019)

Download Google Scholar

Abstract

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below a threshold that -- importantly -- can be modified in real-time. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Budgeted Reinforcement Learning in Continuous State Space

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs