Vikas Sindhwani

Vikas Sindhwani

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing “classical” and “learning-based” techniques for agile robot control. Videos of our experiments may be found here: https://sites.google.com/view/agile-catching. View details
    Robotic Table Tennis: A Case Study into a High Speed Learning System
    Jon Abelian
    Saminda Abeyruwan
    Michael Ahn
    Justin Boyd
    Erwin Johan Coumans
    Omar Escareno
    Wenbo Gao
    Navdeep Jaitly
    Juhana Kangaspunta
    Satoshi Kataoka
    Gus Kouretas
    Yuheng Kuang
    Corey Lynch
    Thinh Nguyen
    Ken Oslund
    Barney J. Reed
    Anish Shankar
    Avi Singh
    Grace Vesom
    Peng Xu
    Robotics: Science and Systems (2023)
    Preview abstract We present a deep-dive into a learning robotic system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized and novel perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description including numerous design decisions that are typically not widely disseminated, with a collection of ablation studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, and sensitivity to policy hyper-parameters and choice of action space. A video demonstrating the components of our system and details of experimental results is included in the supplementary material. View details
    Single-Level Differentiable Contact Simulation
    Simon Le Cleac'h
    Mac Schwager
    Zachary Manchester
    Pete Florence
    IEEE RAL (2023)
    Preview abstract We present a differentiable formulation of rigid-body contact dynamics for objects and robots represented as compositions of convex primitives. Existing optimization-based approaches simulating contact between convex primitives rely on a bilevel formulation that separates collision detection and contact simulation. These approaches are unreliable in realistic contact simulation scenarios because isolating the collision detection problem introduces contact location non-uniqueness. Our approach combines contact simulation and collision detection into a unified single-level optimization problem. This disambiguates the collision detection problem in a physics-informed manner. Compared to previous differentiable simulation approaches, our formulation features improved simulation robustness and computational complexity improved by more than an order of magnitude. We provide a numerically efficient implementation of our formulation in the Julia language called \href{https://github.com/simon-lc/DojoLight.jl}{DojoLight.jl}. View details
    Trajectory Optimization with Optimization-Based Dynamics
    Taylor Howell
    Simon Le Cleac'h
    Pete Florence
    Zachary Manchester
    ICRA (2022)
    Preview abstract We present a framework for bi-level trajectory optimization in which a system's dynamics are encoded as the solution to a constrained optimization problem and smooth gradients of this lower-level problem are passed to an upper-level trajectory optimizer. This optimization-based dynamics representation enables constraint handling, additional variables, and non-smooth behavior to be abstracted away from the upper-level optimizer, and allows classical unconstrained optimizers to synthesize trajectories for more complex systems. We provide a path-following method for efficient evaluation of constrained dynamics and utilize the implicit-function theorem to compute smooth gradients of this representation. We demonstrate the framework by modeling systems from locomotion, aerospace, and manipulation domains including: acrobot with joint limits, cart-pole subject to Coulomb friction, Raibert hopper, rocket landing with thrust limits, and planar-push task with optimization-based dynamics and then optimize trajectories using iterative LQR. View details
    Preview abstract Indirect trajectory optimization methods such as Differential Dynamic Programming (DDP) have found considerable success when only planning under dynamic feasibility constraints. Meanwhile, nonlinear programming (NLP) has been the state-of-the-art approach when faced with additional constraints (e.g., control bounds, obstacle avoidance). However, a na{\"i}ve implementation of NLP algorithms, e.g., shooting-based sequential quadratic programming (SQP), may suffer from slow convergence -- caused from natural instabilities of the underlying system manifesting as poor numerical stability within the optimization. Re-interpreting the DDP closed-loop rollout policy as a \emph{sensitivity-based correction to a second-order search direction}, we demonstrate how to compute analogous closed-loop policies (i.e., feedback gains) for \emph{constrained} problems. Our key theoretical result introduces a novel dynamic programming-based constraint-set recursion that augments the canonical ``cost-to-go" backward pass. On the algorithmic front, we develop a hybrid-SQP algorithm incorporating DDP-style closed-loop rollouts, enabled via efficient \emph{parallelized} computation of the feedback gains. Finally, we validate our theoretical and algorithmic contributions on a set of increasingly challenging benchmarks, demonstrating significant improvements in convergence speed over standard open-loop SQP. View details
    Hybrid Random Features
    Haoxian Chen
    Han Lin
    Yuanzhe Ma
    Arijit Sehanobish
    Michael Ryoo
    Jake Varley
    Andy Zeng
    Valerii Likhosherstov
    Dmitry Kalashnikov
    Adrian Weller
    International Conference on Learning Representations (ICLR) (2022)
    Preview abstract We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi & Recht, 2007) or (recently introduced in the context of linear-attention Transformers) positive random features (Choromanski et al., 2021b). By generalizing Bochner’s Theorem for softmax/Gaussian kernels and leveraging random features for compositional kernels, the HRF-mechanism provides strong theoretical guarantees - unbiased approximation and strictly smaller worst-case relative errors than its counterparts. We conduct exhaustive empirical evaluation of HRF ranging from pointwise kernel estimation experiments, through tests on data admitting clustering structure to benchmarking implicit-attention Transformers (also for downstream Robotics applications), demonstrating its quality in a wide spectrum of machine learning problems. View details
    Preview abstract Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g., spreadsheets, SAT questions, code). As a result, these models store different forms of commonsense knowledge across different domains. In this work, we show that this diversity is symbiotic, and can be leveraged through Socratic Models (SMs): a modular framework in which multiple pretrained models may be composed zero-shot i.e., via multimodal-informed prompting, to exchange information with each other and capture new multimodal capabilities, without requiring finetuning. With minimal engineering, SMs are not only competitive with state-of-the-art zero-shot image captioning and video-to-text retrieval, but also enable new applications such as (i) answering free-form questions about egocentric video, (ii) engaging in multimodal assistive dialogue with people (e.g., for cooking recipes) by interfacing with external APIs and databases (e.g., web search), and (iii) robot perception and planning. Prototypes are available at socraticmodels.github.io View details
    Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
    Anthony G. Francis
    Dmitry Kalashnikov
    Edward Lee
    Jake Varley
    Leila Takayama
    Mikael Persson
    Peng Xu
    Stephen Tu
    Xuesu Xiao
    Conference on Robot Learning (2022) (to appear)
    Preview abstract Despite decades of research, existing navigation systems still face real-world challenges when being deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints of Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers---a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves 40% better goal reached in cluttered environments and 65% better sociability when navigating around humans. View details
    Preview abstract We propose an end-to-end framework to enablemultipurpose assistive mobile robots to autonomously wipetables and clean spills and crumbs. This problem is chal-lenging, as it requires planning wiping actions with uncertainlatent crumbs and spill dynamics over high-dimensional visualobservations, while simultaneously guaranteeing constraintssatisfaction to enable deployment in unstructured environments.To tackle this problem, we first propose a stochastic differentialequation (SDE) to model crumbs and spill dynamics and ab-sorption with the robot wiper. Then, we formulate a stochasticoptimal control for planning wiping actions over visual obser-vations, which we solve using reinforcement learning (RL). Wethen propose a whole-body trajectory optimization formulationto compute joint trajectories to execute wiping actions whileguaranteeing constraints satisfaction. We extensively validateour table wiping approach in simulation and on hardware. View details
    Preview abstract Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human interaction. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability. View details