- Jost Tobias Springenberg
- Karol Hausman
- Martin Riedmiller
- Nicolas Heess
- Ziyu Wang
Abstract
We present a method that learns manipulation skills that are continuously parameterized in a skill embedding space and which we can take advantage of for rapidly solving new tasks. We learn skills by taking advantage of latent variables. The main contribution of our work is an entropy-regularized policy gradient formulation for hierarchical policies, and an associated, data-efficient and robust off-policy gradient algorithm based on stochastic value gradients. We demonstrate the effectiveness of our method on several simulated robotic manipulation tasks. We find that our method allows for the discovery of multiple solutions and is capable of learning the minimum number of distinct skills that are necessary to solve a given set of tasks.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work