Deepali Jain
I completed my undergraduate degree in Electrical Engineering from Indian Institute of Technology Roorkee. After that, I worked as a Research Fellow in Adobe Research for two years. There, I worked on problems in the area of sequence modeling and human decision making.
I have now developed a research interest in Reinforcement Learning (RL). I am motivated by the prospect that RL can serve as a general framework for solving intelligent decision-making problems across domains. More specifically, I am interested in addressing research challenges of partial observability, sparse rewards, sample inefficiency, multiple goals and multi-agent dynamics. In general, I like to think and read about ways to advance machine cognition.
The Google AI residency program has provided me with the right environment and mentorship to develop my research skills. I am collaborating with the Robotics NY team to improve sample efficiency of RL algorithms for legged locomotion task. My mentors welcome new ideas from me and guide me to the right direction for further exploration. Google’s infrastructure makes trying out experiments very easy. It is exciting to tackle challenges in deploying policies on a real robot. I am enjoying this learning experience.
Outside work, I love creating art, reading and exploring the city.
Research Areas
Authored Publications
Sort By
Robotic Table Tennis: A Case Study into a High Speed Learning System
Jon Abelian
Saminda Abeyruwan
Michael Ahn
Justin Boyd
Erwin Johan Coumans
Omar Escareno
Wenbo Gao
Navdeep Jaitly
Juhana Kangaspunta
Satoshi Kataoka
Gus Kouretas
Yuheng Kuang
Corey Lynch
Thinh Nguyen
Ken Oslund
Barney J. Reed
Anish Shankar
Avi Singh
Grace Vesom
Peng Xu
Robotics: Science and Systems (2023)
Preview abstract
We present a deep-dive into a learning robotic system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized and novel perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description including numerous design decisions that are typically not widely disseminated, with a collection of ablation studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, and sensitivity to policy hyper-parameters and choice of action space. A video demonstrating the components of our system and details of experimental results is included in the supplementary material.
View details
Agile Catching with Whole-Body MPC and Blackbox Policy Learning
Saminda Abeyruwan
Nick Boffi
Anish Shankar
Jean-Jacques Slotine
Stephen Tu
Learning for Dynamics and Control (2023)
Preview abstract
We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a
challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing “classical” and “learning-based” techniques for agile robot control. Videos of our experiments may be found here: https://sites.google.com/view/agile-catching.
View details
i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops
Saminda Wishwajith Abeyruwan
Avi Singh
Anish Shankar
Conference on Robot Learning (Oral) (2022)
Preview abstract
Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. However, prior works in sim-to-real transfer of robotic policies typically do not involve any human-robot interaction because accurately simulating human behavior is an open problem. In this work, our goal is to leverage the power of simulation to train robotic policies that are proficient at interacting with humans upon deployment. This presents a chicken-and-egg problem --- how to gather examples of a human interacting with a physical robot so as to model human behavior in simulation without already having a robot that is able to interact with a human? Our proposed method, Iterative-Sim-to-Real i-S2R), attempts to address this. i-S2R bootstraps from a simple model of human behavior and alternates between training in simulation and deploying in the real world. In each iteration, both the human behavior model and the policy are improved. We evaluate our method on a real world robotic table tennis setting, where the objective for the robot is to play cooperatively with a human player for as long as possible. Table tennis is a high-speed, dynamic task that requires the two players to react quickly to each other’s moves, making for a challenging test bed for research on human-robot interaction. We present results on a physical industrial robotic arm that is able to cooperatively play table tennis against human players, achieving rallies of 22 successive hits on average and 150 at best. Further, for 80% of players, rally lengths are 70% to 175% longer compared to the sim-to-real (S2R) baseline.
View details
Hybrid Random Features
Haoxian Chen
Han Lin
Yuanzhe Ma
Arijit Sehanobish
Michael Ryoo
Jake Varley
Andy Zeng
Valerii Likhosherstov
Dmitry Kalashnikov
Adrian Weller
International Conference on Learning Representations (ICLR) (2022)
Preview abstract
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi & Recht, 2007) or (recently introduced in the context of linear-attention Transformers) positive random features (Choromanski et al., 2021b). By generalizing Bochner’s Theorem for softmax/Gaussian kernels and leveraging random features for compositional kernels, the HRF-mechanism provides strong theoretical guarantees - unbiased approximation and strictly smaller worst-case relative errors than its counterparts. We conduct exhaustive empirical evaluation of HRF ranging from pointwise kernel estimation experiments, through tests on data admitting clustering structure to benchmarking implicit-attention Transformers (also for downstream Robotics applications), demonstrating its quality in a wide spectrum of machine learning problems.
View details
Reward Machines for Vision-Based Robotic Manipulation.
Alberto Camacho
Andy Zeng
Dmitry Kalashnikov
Jake Varley
International Conference on Robotics and Automation (2021)
Preview abstract
Deep Q learning (DQN) has enabled robot agents to accomplish vision based tasks that seemed out of reach. Despite recent success stories, there are still several sources of computational complexity that challenge the performance of DQN. We place the focus on vision manipulation tasks, where the correct action selection is often predicated on a small number of pixels. We observe that in some of these tasks DQN does not converge to the optimal Q function, and their values do not separate well optimal and suboptimal actions. In consequence, the policies obtained with DQN tend to be brittle and manifest a low success rate, especially in long horizon tasks. In this work we show the benefits of Reward Machines (RMs) for Deep Q learning (DQRM) in vision based robot manipulation tasks. Reward machines decompose the task at an abstract level, inform the agent about their current stage along task completion, and guide them via dense rewards. We show that RMs help DQN learn the optimal Q values in each abstract state. Their policies are more robust, manifest higher success rate, and are learned with fewer training steps compared with DQN. The benefits of RMs are more evident in long-horizon tasks, where we show that DQRM is able to learn good-quality policies with six times times fewer training steps than DQN, even when this is equipped with dense reward shaping.
View details
Learning to walk on complex terrains with vision
Ale Escontrela
Erwin Johan Coumans
Peng Xu
Sehoon Ha
Conference on Robotic Learning (2021)
Preview abstract
Visual feedback is crucial for legged robots to safely and efficiently handle uneven terrains such as stairs. However, effectively training robots to effectively consume high dimensional visual input for locomotion is challenging. In this work, we propose a framework to train a vision-based locomotion controller for quadruped robots to traverse a variety of uneven environments. Our key idea is to model the locomotion controller as a hierarchical structure with a high-level vision policy and a low-level motion controller. The high-level vision policy takes as input the perceived vision inputs as well as robot states and outputs desired foothold placement and base movement of the robot, which is realized by low level motion controller composed of a position controller for swing legs and a MPC-based torque controller for stance legs. We train the vision policy using Deep Reinforcement Learning and demonstrate our approach on a variety of uneven environments such as step-stones, stairs, pillars, and moving platforms. We also deploy our policy on a real quadruped robot to walk over a series of random step-stones.
View details
Disentangled Planning and Control in Vision Based Robotics via Reward Machines
Alberto Camacho
Andy Zeng
Dmitry Kalashnikov
Jake Varley
Deep Reinforcement Learning Workshop (Deep RL), collocated with NeurIPS 2020 (2020)
Preview abstract
In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.
View details
Provably Robust Blackbox Optimization for Reinforcement Learning
Aldo Pacchiano
Jack Parker-Holder
Yunhao Tang
Yuxiang Yang
Conference on Robot Learning (CoRL 2019)
Preview abstract
Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics. However, it is well known that DFO methods suffer from prohibitively high sampling complexity. They can also be very sensitive to noisy rewards and stochastic dynamics. In this paper, we propose a new class of algorithms, called Robust Blackbox Optimization (RBO). Remarkably, even if up to 23% of all the measurements are arbitrarily corrupted, RBO can provably recover gradients to high accuracy. RBO relies on learning gradient flows using robust regression methods to enable off-policy updates. On several MuJoCo robot control tasks, when all other RL approaches collapse in the presence of adversarial noise, RBO is able to train policies effectively. We also show that RBO can be applied to legged locomotion tasks including path tracking for quadruped robots.
View details