Jump to Content

Multi-task Deep Reinforcement Learning with PopArt

Matteo Hessel
Hubert Soyer
Wojciech Czarnecki
Simon Schmitt
Hado van Hasselt
DeepMind (2019) (to appear)

Abstract

The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. In this work, we investigate algorithms capable of learning to master not one but multiple sequential-decision tasks at once. We use PopArt normalisation to derive scale invariant policy-gradient updates, and we propose an actor critic architecture designed for multi-task learning. In combination with the IMPALA reinforcement-learning architecture this results in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learns a single trained policy - with a single set of weights - that exceeds median human performance across all games. To our knowledge, this is the first time a single agent surpasses human-level performance on this multi-task domain. The same approach demonstrates state of the art results on a set of 30 tasks defined in the 3D reinforcement learning platform DeepMind Lab.

Research Areas