IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

  • Lasse Espeholt
  • Hubert Soyer
  • Remi Munos
  • Karen Simonyan
  • Volodymyr Mnih
  • Tom Ward
  • Yotam Doron
  • Vlad Firoiu
  • Tim Harley
  • Iain Robert Dunning
  • Shane Legg
  • Koray Kavukcuoglu
ArXiv (2018) (to appear)


In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time, which is already a problem in single task learning. In order to tackle this challenging problem, we have developed a new distributed agent architecture IMPALA (Importance-Weighted Actor Learner) that can scale to using thousands of machines and achieve a throughput rate of $250,000$ frames per second. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace, which was critical for achieving learning stability. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment \cite{beattie2016dmlab}) and ATARI-57 (all available ATARI games in Arcade Learning Environment \cite{bellemare13arcade}). Our results show that IMPALA is able to achieve better performance than previous agents, uses less data and crucially exhibits positive transfer between tasks as a result of its multi-task approach.

