Jump to Content
Vikas Sindhwani

Vikas Sindhwani

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing “classical” and “learning-based” techniques for agile robot control. Videos of our experiments may be found here: https://sites.google.com/view/agile-catching. View details
    Robotic Table Tennis: A Case Study into a High Speed Learning System
    Jon Abelian
    Saminda Abeyruwan
    Michael Ahn
    Justin Boyd
    Erwin Johan Coumans
    Omar Escareno
    Wenbo Gao
    Navdeep Jaitly
    Juhana Kangaspunta
    Satoshi Kataoka
    Gus Kouretas
    Yuheng Kuang
    Corey Lynch
    Thinh Nguyen
    Ken Oslund
    Barney J. Reed
    Anish Shankar
    Avi Singh
    Grace Vesom
    Peng Xu
    Robotics: Science and Systems (2023)
    Preview abstract We present a deep-dive into a learning robotic system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized and novel perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description including numerous design decisions that are typically not widely disseminated, with a collection of ablation studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, and sensitivity to policy hyper-parameters and choice of action space. A video demonstrating the components of our system and details of experimental results is included in the supplementary material. View details
    Preview abstract We present a differentiable formulation of rigid-body contact dynamics for objects and robots represented as compositions of convex primitives. Existing optimization-based approaches simulating contact between convex primitives rely on a bilevel formulation that separates collision detection and contact simulation. These approaches are unreliable in realistic contact simulation scenarios because isolating the collision detection problem introduces contact location non-uniqueness. Our approach combines contact simulation and collision detection into a unified single-level optimization problem. This disambiguates the collision detection problem in a physics-informed manner. Compared to previous differentiable simulation approaches, our formulation features improved simulation robustness and computational complexity improved by more than an order of magnitude. We provide a numerically efficient implementation of our formulation in the Julia language called \href{https://github.com/simon-lc/DojoLight.jl}{DojoLight.jl}. View details
    Preview abstract Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human interaction. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability. View details
    Hybrid Random Features
    Haoxian Chen
    Han Lin
    Yuanzhe Ma
    Arijit Sehanobish
    Michael Ryoo
    Jake Varley
    Valerii Likhosherstov
    Dmitry Kalashnikov
    Adrian Weller
    International Conference on Learning Representations (ICLR) (2022)
    Preview abstract We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi & Recht, 2007) or (recently introduced in the context of linear-attention Transformers) positive random features (Choromanski et al., 2021b). By generalizing Bochner’s Theorem for softmax/Gaussian kernels and leveraging random features for compositional kernels, the HRF-mechanism provides strong theoretical guarantees - unbiased approximation and strictly smaller worst-case relative errors than its counterparts. We conduct exhaustive empirical evaluation of HRF ranging from pointwise kernel estimation experiments, through tests on data admitting clustering structure to benchmarking implicit-attention Transformers (also for downstream Robotics applications), demonstrating its quality in a wide spectrum of machine learning problems. View details
    Continuous Control and Multiscale Sensor Fusion with Neural CDEs
    Francis Edward McCann Ramirez
    Jake Varley
    IROS & RSS Imitation Learning Workshop (2022)
    Preview abstract Even though robot learning is often formulated in terms of discrete-time Markov decision processes (MDPs), physical robots require near-continuous multiscale feedback control. Machines operate on multiple asynchronous sensing modalities each with different frequencies, e.g., video frames at 30Hz, proprioceptive state at 100Hz, force-torque data at 500Hz, etc. While the classic approach is to batch observations into fixed-time windows then pass them through feed-forward encoders (e.g., with deep networks), we show that there exists a more elegant approach -- one that treats policy learning as modeling latent state dynamics in continuous-time. Specifically, we present 'InFuser', a unified architecture that trains continuous time-policies with Neural Controlled Differential Equations (CDEs). 'InFuser' evolves a single latent state representation over time by (In)tegrating and (Fus)ing multi-sensory observations (arriving at different frequencies), and inferring actions in continuous-time. This enables policies that can react to multi-frequency multi-sensory feedback for truly end-to-end visuomotor control, without discrete-time assumptions. Behavior cloning experiments demonstrate that 'InFuser' learns robust policies for dynamic tasks (e.g., swinging a ball into a cup) notably outperforming several baselines in settings where observations from one sensing modality can arrive at much sparser intervals than others. View details
    Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
    Anthony G. Francis
    Dmitry Kalashnikov
    Edward Lee
    Jake Varley
    Leila Takayama
    Mikael Persson
    Peng Xu
    Stephen Tu
    Xuesu Xiao
    Conference on Robot Learning (2022) (to appear)
    Preview abstract Despite decades of research, existing navigation systems still face real-world challenges when being deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints of Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers---a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves 40% better goal reached in cluttered environments and 65% better sociability when navigating around humans. View details
    Preview abstract We present a framework for bi-level trajectory optimization in which a system's dynamics are encoded as the solution to a constrained optimization problem and smooth gradients of this lower-level problem are passed to an upper-level trajectory optimizer. This optimization-based dynamics representation enables constraint handling, additional variables, and non-smooth behavior to be abstracted away from the upper-level optimizer, and allows classical unconstrained optimizers to synthesize trajectories for more complex systems. We provide a path-following method for efficient evaluation of constrained dynamics and utilize the implicit-function theorem to compute smooth gradients of this representation. We demonstrate the framework by modeling systems from locomotion, aerospace, and manipulation domains including: acrobot with joint limits, cart-pole subject to Coulomb friction, Raibert hopper, rocket landing with thrust limits, and planar-push task with optimization-based dynamics and then optimize trajectories using iterative LQR. View details
    Preview abstract We propose an end-to-end framework to enablemultipurpose assistive mobile robots to autonomously wipetables and clean spills and crumbs. This problem is chal-lenging, as it requires planning wiping actions with uncertainlatent crumbs and spill dynamics over high-dimensional visualobservations, while simultaneously guaranteeing constraintssatisfaction to enable deployment in unstructured environments.To tackle this problem, we first propose a stochastic differentialequation (SDE) to model crumbs and spill dynamics and ab-sorption with the robot wiper. Then, we formulate a stochasticoptimal control for planning wiping actions over visual obser-vations, which we solve using reinforcement learning (RL). Wethen propose a whole-body trajectory optimization formulationto compute joint trajectories to execute wiping actions whileguaranteeing constraints satisfaction. We extensively validateour table wiping approach in simulation and on hardware. View details
    Preview abstract Indirect trajectory optimization methods such as Differential Dynamic Programming (DDP) have found considerable success when only planning under dynamic feasibility constraints. Meanwhile, nonlinear programming (NLP) has been the state-of-the-art approach when faced with additional constraints (e.g., control bounds, obstacle avoidance). However, a na{\"i}ve implementation of NLP algorithms, e.g., shooting-based sequential quadratic programming (SQP), may suffer from slow convergence -- caused from natural instabilities of the underlying system manifesting as poor numerical stability within the optimization. Re-interpreting the DDP closed-loop rollout policy as a \emph{sensitivity-based correction to a second-order search direction}, we demonstrate how to compute analogous closed-loop policies (i.e., feedback gains) for \emph{constrained} problems. Our key theoretical result introduces a novel dynamic programming-based constraint-set recursion that augments the canonical ``cost-to-go" backward pass. On the algorithmic front, we develop a hybrid-SQP algorithm incorporating DDP-style closed-loop rollouts, enabled via efficient \emph{parallelized} computation of the feedback gains. Finally, we validate our theoretical and algorithmic contributions on a set of increasingly challenging benchmarks, demonstrating significant improvements in convergence speed over standard open-loop SQP. View details
    Preview abstract Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g., spreadsheets, SAT questions, code). As a result, these models store different forms of commonsense knowledge across different domains. In this work, we show that this diversity is symbiotic, and can be leveraged through Socratic Models (SMs): a modular framework in which multiple pretrained models may be composed zero-shot i.e., via multimodal-informed prompting, to exchange information with each other and capture new multimodal capabilities, without requiring finetuning. With minimal engineering, SMs are not only competitive with state-of-the-art zero-shot image captioning and video-to-text retrieval, but also enable new applications such as (i) answering free-form questions about egocentric video, (ii) engaging in multimodal assistive dialogue with people (e.g., for cooking recipes) by interfacing with external APIs and databases (e.g., web search), and (iii) robot perception and planning. Prototypes are available at socraticmodels.github.io View details
    Preview abstract Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. The complex dynamics and high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even for goal specification. Goals cannot be as easily specified as rigid object poses, and may involve complex relative spatial relations such as ``place the item inside the bag". In this work, we develop a suite of simulated benchmarks with 1D, 2D, and 3D deformable structures, including tasks that involve image-based goal-conditioning and multi-step deformable manipulation. We propose embedding goal-conditioning into Transporter Networks, a recently proposed model architecture for robotic manipulation that uses learned template matching to infer displacements that can represent pick and place actions. We demonstrate that goal-conditioned Transporter Networks enable agents to manipulate deformable structures into flexibly specified configurations without test-time visual anchors for target locations. We also significantly extend prior results using Transporter Networks for manipulating deformable objects by testing on tasks with 2D and 3D deformables. View details
    Stochastic Flows and Geometric Optimization on the Orthogonal Group
    David Cheikhi
    Jared Davis
    Valerii Likhosherstov
    Achille Nazaret
    Achraf Bahamou
    Xingyou Song
    Mrugank Akarte
    Jack Parker-Holder
    Jacob Bergquist
    Yuan Gao
    Aldo Pacchiano
    Adrian Weller
    Thirty-seventh International Conference on Machine Learning (ICML 2020) (to appear)
    Preview abstract We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group O(d) and naturally reductive homogeneous manifolds obtained from the action of the rotation group SO(d). We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, normalizing flows and metric learning. We show an intriguing connection between efficient stochastic optimization on the orthogonal group and graph theory (e.g. matching problem, partition functions over graphs, graph-coloring). We leverage the theory of Lie groups and provide theoretical results for the designed class of algorithms. We demonstrate broad applicability of our methods by showing strong performance on the seemingly unrelated tasks of learning world models to obtain stable policies for the most difficult Humanoid agent from OpenAI Gym and improving convolutional neural networks. View details
    An Ode to an ODE
    Jared Quincy Davis
    Valerii Likhosherstov
    Jean-Jacques Slotine
    Jake Varley
    Honglak Lee
    Adrian Weller
    NeurIPS 2020 (2020)
    Preview abstract We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem which is intrinsically related to training deep neural network architectures such as Neural ODEs. Consequently, it leads to better downstream models, as we show on the example of training reinforcement learning policies with evolution strategies, and in the supervised learning setting, by comparing with previous SOTA baselines. We provide strong convergence results for our proposed mechanism that are independent of the depth of the network, supporting our empirical studies. Our results show an intriguing connection between the theory of deep neural networks and the field of matrix flows on compact manifolds. View details
    Preview abstract Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass object(s) or an end effector. In this work, we propose the Transporter Network, a simple model architecture that rearranges deep features to infer spatial displacements from visual input -- which can parameterize robot actions. It makes no assumptions of objectness (e.g. canonical poses, models, or keypoints), it exploits spatial symmetries, and is orders of magnitude more sample efficient than our benchmarked alternatives in learning vision-based manipulation tasks: from stacking a pyramid of blocks, to assembling kits with unseen objects; from manipulating deformable ropes, to pushing piles of small objects with closed-loop feedback. Our method can represent complex multi-modal policy distributions and generalizes to multi-step sequential tasks, as well as 6DoF pick-and-place. Experiments on 10 simulated tasks show that it learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses. We validate our methods with hardware in the real world. View details
    Unsupervised Anomaly Detection for Self-flying Delivery Drones
    Hakim Sidahmed
    Brandon Jones
    International Conference on Robotics and Automation (2020)
    Preview abstract We propose a novel anomaly detection framework for a fleet of hybrid aerial vehicles executing high-speed package pickup and delivery missions. The detection is based on machine learning models of normal flight profiles, trained on millions of flight log measurements of control inputs and sensor readings. We develop a new scalable algorithm for robust regression which can simultaneously fit predictive flight dynamics models while identifying and discarding abnormal flight missions from the training set. The resulting unsupervised estimator has a very high breakdown point and can withstand massive contamination of training data to uncover what normal flight patterns look like, without requiring any form of prior knowledge of aircraft aerodynamics or manual labeling of anomalies upfront. Across many different anomaly types, spanning simple 3-sigma statistical thresholds to turbulence and other equipment anomalies, our models achieve high detection rates across the board. Our method consistently outperforms alternative robust detection methods on synthetic benchmark problems. To the best of our knowledge, dynamics modeling of hybrid delivery drones for anomaly detection at the scale of 100 million measurements from 5000 real flight missions in variable flight conditions is unprecedented. View details
    Preview abstract We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned curriculum learning on the task and rewards, policies are capable of developing multi-modal styles, specifically forehand and backhand stroke, whilst achieving 80\% return rate on a wide range of ball throws. We observe that multi-modality does not require any architectural priors, such as multi-head architectures or hierarchical policies. View details
    From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization
    Aldo Pacchiano
    Jack Parker-Holder
    Yunhao Tang
    Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019)
    Preview abstract We present a new algorithm ASEBO for optimizing high-dimensional blackbox functions. ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly. It addresses the exploration-exploitation trade-off of blackbox optimization with expensive blackbox queries by continuously learning the bias of the lower-dimensional model used to approximate gradients of smoothings of the function via compressed sensing and contextual bandits methods. To obtain this model, it leverages techniques from the emerging theory of active subspaces in the novel ES blackbox optimization context. As a result, ASEBO learns the dynamically changing intrinsic dimensionality of the gradient space and adapts to the hardness of different stages of the optimization without external supervision. Consequently, it leads to more sample-efficient blackbox optimization than state-of-the-art algorithms. We provide theoretical results and test ASEBO advantages over other methods empirically by evaluating it on the set of reinforcement learning policy optimization tasks as well as functions from the recently open-sourced Nevergrad library. View details
    Preview abstract Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics. However, it is well known that DFO methods suffer from prohibitively high sampling complexity. They can also be very sensitive to noisy rewards and stochastic dynamics. In this paper, we propose a new class of algorithms, called Robust Blackbox Optimization (RBO). Remarkably, even if up to 23% of all the measurements are arbitrarily corrupted, RBO can provably recover gradients to high accuracy. RBO relies on learning gradient flows using robust regression methods to enable off-policy updates. On several MuJoCo robot control tasks, when all other RL approaches collapse in the presence of adversarial noise, RBO is able to train policies effectively. We also show that RBO can be applied to legged locomotion tasks including path tracking for quadruped robots. View details
    Preview abstract We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity. View details
    Learning-based Air Data System for Safe and Efficient Control of Fixed-Wing Aerial Vehicles
    Brandon Jones
    Damien Jourdan
    Maciej Chociej
    Byron Boots
    THE 16TH IEEE INTERNATIONAL SYMPOSIUM ON SAFETY, SECURITY, AND RESCUE ROBOTICS (SSRR-2018) (to appear)
    Preview abstract We develop an air data system for aerial robots executing high-speed outdoor missions subject to significant aerodynamic forces on their bodies. The system is based on a combination of Extended Kalman Filtering (EKF) and autoregressive feedforward Neural Networks, relying only on IMU sensors and GPS. This eliminates the need to instrument the vehicle with Pitot tubes and mechanical vanes, reducing associated cost, weight, maintenance requirements and likelihood of catastrophic mechanical failures. The system is trained to clone the behaviour of Pitot-tube measurements on thousands of instrumented simulated and real flights, and does not require a vehicle aerodynamics model. We demonstrate that safe guidance and navigation is possible in executing complex maneuvers in the presence of wind gusts without relying on airspeed sensors. We also demonstrate accuracy enhancements from successful “simulation-to-reality” transfer and dataset aggregation techniques to correct for training-test distribution mismatches when the air-data system and the control stack operate in closed loop. View details
    The Geometry of Random Features
    Mark Rowland
    Richard Turner
    Adrian Weller
    International Conference on Artificial Intelligence and Statistics (AISTATS) (2018)
    Preview abstract We present an in-depth examination of the effectiveness of radial basis function kernel (beyond Gaussian) estimators based on orthogonal random feature maps. We show that orthogonal estimators outperform state-of-the-art mechanisms that use iid sampling under weak conditions for tails of the associated Fourier distributions. We prove that for the case of many dimensions, the superiority of the orthogonal transform over iid methods can be accurately measured by a property we define called the charm of the kernel, and that orthogonal random features provide optimal kernel estimators. Furthermore, we provide the first theoretical results which explain why orthogonal random features outperform unstructured on downstream tasks such as kernel ridge regression by showing that orthogonal random features provide kernel algorithms with better spectral properties than the previous state-of-the-art. Our results enable practitioners more generally to estimate the benefits from applying orthogonal transforms. View details
    Preview abstract We propose a simple drop-in noise-tolerant replacement for the standard finite difference procedure used ubiquitously in blackbox optimization. In our approach, parameter perturbation directions are defined by a family of deterministic or randomized structured matrices. We show that at the small cost of computing a Fast Fourier Transform (FFT), such structured finite differences consistently give higher quality approximation of gradients and Jacobians in comparison to vanilla approaches that use coordinate directions or random Gaussian perturbations. We show that linearization of noisy, blackbox dynamics using our methods leads to improved performance of trajectory optimizers like Iterative LQR and Differential Dynamic Programming on several classic continuous control tasks. By embedding structured exploration in implicit filtering methods, we are able to learn agile walking and turning policies for quadruped locomotion, that successfully transfer from simulation to actual hardware. We give a theoretical justification of our methods in terms of bounds on the quality of gradient reconstruction in the presence of noise. View details
    Preview abstract We present a new method of blackbox optimization via gradient approximation with the use of structured random orthogonal matrices, providing more accurate estimators than baselines and with provable theoretical guarantees. We show that this algorithm can be successfully applied to learn better quality compact policies than those using standard gradient estimation techniques. The compact policies we learn have several advantages over unstructured ones, including faster training algorithms and faster inference. These benefits are important when the policy is deployed on real hardware with limited resources. Further, compact policies provide more scalable architectures for derivative-free optimization (DFO) in high-dimensional spaces. We show that most robotics tasks from the OpenAI Gym can be solved using neural networks with less than 300 parameters, with almost linear time complexity of the inference phase, with up to 13x fewer parameters relative to the Evolution Strategies (ES) algorithm introduced by Salimans et al. (2017). We do not need heuristics such as fitness shaping to learn good quality policies, resulting in a simple and theoretically motivated training mechanism. View details
    Learning Stabilizable Dynamical Systems via Control Contraction Metrics
    Sumeet Singh
    Jean-Jacques Slotine
    Marco Pavone
    Workshop on Algorithmic Foundations of Robotics (WAFR-2018) (to appear)
    Preview abstract We propose a novel framework for learning stabilizable nonlinear dynamical systems for continuous control tasks in robotics. The key idea is to develop a new control-theoretic regularizer for dynamics fitting rooted in the notion of stabilizability, which guarantees that the learned system can be accompanied by a robust controller capable of stabilizing any open-loop trajectory that the system may generate. By leveraging tools from contraction theory, statistical learning, and convex optimization, we provide a general and tractable semi-supervised algorithm to learn stabilizable dynamics, which can be applied to complex underactuated systems. We validated the proposed algorithm on a simulated planar quadrotor system and observed notably improved trajectory generation and tracking performance with the control-theoretic regularized model over models learned using traditional regression techniques, especially when using a small number of demonstration examples. The results presented illustrate the need to infuse standard model-based reinforcement learning algorithms with concepts drawn from nonlinear control theory for improved reliability. View details
    Manifold Regularization for Kernelized LSTD
    Xinyan Yan
    Byron Boots
    1st Annual Conference on Robot Learning (CoRL 2017) (to appear)
    Preview abstract Policy evaluation or value function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore, its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in action-state value function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality. View details
    Sequential Operator Splitting for Constrained Nonlinear Optimal Control
    Rebecca Roelofs
    Mrinal Kalakrishnan
    American Control Conference (ACC) (2017)
    Preview abstract We develop TROSS, a solver for constrained nonsmooth trajectory optimization based on a sequential operator splitting framework. TROSS iteratively improves trajectories by solving a sequence of subproblems setup within evolving trust regions around current iterates using the Alternating Direction Method of Multipliers (ADMM). TROSS achieves consensus among competing objectives, such as finding low-cost dynamically feasible trajectories respecting control limits and safety constraints. A library of building blocks in the form of inexpensive and parallelizable proximal operators associated with trajectory costs and constraints can be used to configure the solver for a variety of tasks. The method shows faster cost reduction compared to iterative Linear Quadratic Regulator (iLQR) and Sequential Quadratic Programming (SQP) on a control-limited vehicle maneuvering task. We demonstrate TROSS on shortest-path navigation of a variant of Dubin’s car in the presence of obstacles, while exploiting passive dynamics of the system. When applied to a constrained robust state estimation problem involving nondifferentiable nonconvex penalties, TROSS shows less susceptibility to non-Gaussian dynamics disturbances and measurement outliers in comparison to an Extended Kalman smoother. Unlike generic SQP methods, our approach produces time-varying linear feedback control policies even for constrained control tasks. The solver is potentially suitable for nonlinear model predictive control and moving horizon state estimation in embedded systems. View details
    Preview abstract From a small number of calls to a given “blackbox"" on random input perturbations, we show how to efficiently recover its unknown Jacobian, or estimate the left action of its Jacobian on a given vector. Our methods are based on a novel combination of compressed sensing and graph coloring techniques, and provably exploit structural prior knowledge about the Jacobian such as sparsity and symmetry while being noise robust. We demonstrate efficient backpropagation through noisy blackbox layers in a deep neural net, improved data-efficiency in the task of linearizing the dynamics of a rigid body system, and the generic ability to handle a rich class of input-output dependency structures in Jacobian estimation problems. View details
    Preview abstract From a small number of calls to a given “blackbox" on random input perturbations, we show how to efficiently recover its unknown Jacobian, or estimate the left action of its Jacobian on a given vector. Our methods are based on a novel combination of compressed sensing and graph coloring techniques, and provably exploit structural prior knowledge about the Jacobian such as sparsity and symmetry while being noise robust. We demonstrate efficient backpropagation through noisy blackbox layers in a deep neural net, improved data-efficiency in the task of linearizing the dynamics of a rigid body system, and the generic ability to handle a rich class of input-output dependency structures in Jacobian estimation problems. View details
    Geometry of 3D Environments and Sum of Squares Polynomials
    Ameer Ali Ahmadi
    Georgina Hall
    Robotics: Science and Systems (2017)
    Preview abstract Motivated by applications in robotics and computer vision, we study problems related to spatial reasoning of a 3D environment using sublevel sets of polynomials. These include: tightly containing a cloud of points (e.g., representing an obstacle) with convex or nearly-convex basic semialgebraic sets, computation of Euclidean distances between two such sets, separation of two convex basic semalgebraic sets that overlap, and tight containment of the union of several basic semialgebraic sets with a single convex one. We use algebraic techniques from sum of squares optimization that reduce all these tasks to semidefinite programs of small size and present numerical experiments in realistic scenarios. View details
    Preview abstract Policy evaluation or value/Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality. View details
    Learning Compact Recurrent Neural Networks
    Zhiyun Lu
    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016
    Preview
    Recycling Randomness with Structure for Sublinear time Kernel Expansions
    Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, JMLR.org, pp. 2502-2510
    Preview abstract We propose a scheme for recycling Gaussian random vectors into structured matrices to approximate various kernel functions in sublinear time via random embeddings. Our framework includes the Fastfood construction of Le et al. (2013) as a special case, but also extends to Circulant, Toeplitz and Hankel matrices, and the broader family of structured matrices that are characterized by the concept of lowdisplacement rank. We introduce notions of coherence and graph-theoretic structural constants that control the approximation quality, and prove unbiasedness and low-variance properties of random feature maps that arise within our framework. For the case of low-displacement matrices, we show how the degree of structure and randomness can be controlled to reduce statistical variance at the cost of increased computation and storage requirements. Empirical results strongly support our theory and justify the use of a broader family of structured matrices for scaling up kernel methods using random features. View details
    Preview abstract We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices. We propose a uni- fied framework to learn a broad family of structured parameter matrices that are characterized by the notion of low displacement rank. Our structured transforms admit fast function and gradient evaluation, and span a rich range of parameter sharing configurations whose statistical modeling capacity can be explicitly tuned along a continuum from structured to unstructured. Experimental results show that these transforms can significantly accelerate inference and forward/backward passes during training, and offer superior accuracy-compactness-speed tradeoffs in comparison to a number of existing techniques. In keyword spotting applications in mobile speech recognition, our methods are much more effective than standard linear low-rank bottleneck layers and nearly retain the performance of state of the art models, while providing more than 3.5-fold compression. View details
    No Results Found