Reward Machines for Vision-Based Robotic Manipulation.
Abstract
Deep Q learning (DQN) has enabled robot agents to accomplish vision based tasks that seemed out of reach. Despite recent success stories, there are still several sources of computational complexity that challenge the performance of DQN. We place the focus on vision manipulation tasks, where the correct action selection is often predicated on a small number of pixels. We observe that in some of these tasks DQN does not converge to the optimal Q function, and their values do not separate well optimal and suboptimal actions. In consequence, the policies obtained with DQN tend to be brittle and manifest a low success rate, especially in long horizon tasks. In this work we show the benefits of Reward Machines (RMs) for Deep Q learning (DQRM) in vision based robot manipulation tasks. Reward machines decompose the task at an abstract level, inform the agent about their current stage along task completion, and guide them via dense rewards. We show that RMs help DQN learn the optimal Q values in each abstract state. Their policies are more robust, manifest higher success rate, and are learned with fewer training steps compared with DQN. The benefits of RMs are more evident in long-horizon tasks, where we show that DQRM is able to learn good-quality policies with six times times fewer training steps than DQN, even when this is equipped with dense reward shaping.