- Lasse Espeholt
- Hubert Soyer
- Remi Munos
- Karen Simonyan
- Volodymyr Mnih
- Tom Ward
- Yotam Doron
- Vlad Firoiu
- Tim Harley
- Iain Robert Dunning
- Shane Legg
- Koray Kavukcuoglu
Abstract
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time, which is already a problem in single task learning. In order to tackle this challenging problem, we have developed a new distributed agent architecture IMPALA (Importance-Weighted Actor Learner) that can scale to using thousands of machines and achieve a throughput rate of $250,000$ frames per second. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace, which was critical for achieving learning stability. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment \cite{beattie2016dmlab}) and ATARI-57 (all available ATARI games in Arcade Learning Environment \cite{bellemare13arcade}). Our results show that IMPALA is able to achieve better performance than previous agents, uses less data and crucially exhibits positive transfer between tasks as a result of its multi-task approach.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work