Aleksandra Faust

Aleksandra Faust

Aleksandra Faust is a Research Director at Google DeepMind. Her research is centered around safe and scalable autonomous systems for social good, including reinforcement learning, planning, and control for robotics, autonomous driving, and digital assistants. Previously, Aleksandra co-founded Reinforcement Learning Research in Google Brain, founded Task and Motion Planning research in Robotics at Google, and machine learning for self-driving car planning and controls in Waymo, and was a senior researcher in Sandia National Laboratories. She earned a Ph.D. in Computer Science at the University of New Mexico with distinction, and a Master's in Computer Science from the University of Illinois at Urbana-Champaign. Aleksandra won the IEEE RAS Early Career Award for Industry, the Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in the period of 2011-2014, and was named Distinguished Alumna by the University of New Mexico School of Engineering. Her work has been featured in the New York Times, PC Magazine, ZdNet, VentureBeat, and ​was awarded Best Paper in Service Robotics at ICRA 2018, Best Paper in Reinforcement Learning for Real Life (RL4RL) at ICML 2019, Best Paper of IEEE Computer Architecture Letters in 2020, and IEEE Micro Top Picks 2023 Honorable Mention.

Note: I am in Google DeepMind now, and this page out of date. See www.afaust.info for the up-to-date info.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Multimodal Web Navigation with Instruction-Finetuned Foundation Models
    Hiroki Furuta
    Ofir Nachum
    Yutaka Matsuo
    Shane Gu
    Izzeddin Gur
    International Conference on Learning Representations (ICLR) (2024)
    Preview abstract The progress of autonomous web navigation has been hindered by the dependence on billions of exploratory interactions via online reinforcement learning, and domain-specific model designs that make it difficult to leverage generalization from rich out-of-domain data. In this work, we study data-driven offline training for web agents with vision-language foundation models. We propose an instruction-following multimodal agent, WebGUM, that observes both webpage screenshots and HTML pages and outputs web navigation actions, such as click and type. WebGUM is trained by jointly finetuning an instruction-finetuned language model and a vision encoder with temporal and local perception on a large corpus of demonstrations. We empirically demonstrate this recipe improves the agent's ability of grounded multimodal perception, HTML comprehension, and multi-step reasoning, outperforming prior works by a significant margin. On the MiniWoB, we improve over the previous best offline methods by more than 45.8%, even outperforming online-finetuned SoTA, humans, and GPT-4-based agent. On the WebShop benchmark, our 3-billion-parameter model achieves superior performance to the existing SoTA, PaLM-540B. Furthermore, WebGUM exhibits strong positive transfer to the real-world planning tasks on the Mind2Web. We also collect 347K high-quality demonstrations using our trained models, 38 times larger than prior work, and make them available to promote future research in this direction. View details
    Preview abstract We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy. It is our hope that this framework will be useful in an analogous way to the levels of autonomous driving, by providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. These principles include focusing on capabilities rather than mechanisms; separately evaluating generality and performance; and defining stages along the path toward AGI, rather than focusing on the endpoint. With these principles in mind, we propose “Levels of AGI” based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems. View details
    Tiny Robot Learning: Challenges and Directions for Machine Learning in Resource-Constrained Robots
    Sabrina Neuman
    Brian Plancher
    Bardienus Pieter Duisterhof
    Srivatsan Krishnan
    Colby R. Banbury
    Mark Mazumder
    Shvetank Prakash
    Jason Jabbour
    Guido C. H. E. de Croon
    Vijay Janapa Reddi
    IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) special session on Low Power Autonomous Systems (2022) (to appear)
    Preview abstract Machine learning (ML) has become a pervasive tool across computing systems. An emerging application that stress-tests the challenges of ML system design is tiny robot learning, the deployment of ML on resource-constrained low-cost autonomous robots. Tiny robot learning lies at the intersection of embedded systems, robotics, and ML, compounding the challenges of these domains. Tiny robot learning is subject to challenges from size, weight, area, and power (SWAP) constraints; sensor, actuator, and compute hardware limitations; end-to-end system tradeoffs; and a large diversity of possible deployment scenarios. Tiny robot learning requires ML models to be designed with these challenges in mind, providing a crucible that reveals the necessity of holistic ML system design and automated end-to-end design tools for agile development. This paper gives a brief survey of the tiny robot learning space, elaborates on key challenges, and proposes promising opportunities for future work in ML system design. View details
    Preview abstract The paper proposes a novel training approach for a neural network to perform switching among an array of computationally generated stochastic optimal feedback controllers. The training is based on the outputs of off-line computed lookup-table metric (LTM) values that store information about individual controller performances. Our study is based on a problem of bicycle kinematic model navigation through a sequence of gates and a more traditional approach to the training is based on kinematic variables (KVs) describing the bicycle-gate relative position. We compare the LTM and KV based training approaches to the navigation problem and find that the LTM training has a faster convergence with less variations than the KV based training. Our results include numerical simulations illustrating the work of the LTM trained neural network switching policy. View details
    Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization
    Sungryull Sohn
    Hyunjae Woo
    Jongwook Choi
    lyubing Qiang
    Izzeddin Gur
    Honglak Lee
    Uncertainty in Artificial Intelligence (UAI) (2022) (to appear)
    Preview abstract We tackle real-world problems with complex structures beyond the pixel-based game or simulator. We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph that defines a set of subtasks and their dependencies that are unknown to the agent. Different from the previous meta-RL methods trying to directly infer the unstructured task embedding, our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks, and use it as a prior to improve the task inference in testing. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks than various existing algorithms such as meta reinforcement learning, hierarchical reinforcement learning, and other heuristic agents. View details
    QuaRL: Quantization for Fast and Environmentally Sustainable Reinforcement Learning
    Gabe Barth-Maron
    Maximilian Lam
    Sharad Chitlangia
    Srivatsan Krishnan
    Vijay Janapa Reddi
    Zishen Wan
    Transactions on Machine Learning Research (TMLR) 2022 (2022)
    Preview abstract Deep reinforcement learning continues to show tremendous potential in achieving task-level autonomy, however, its computational and energy demands remain prohibitively high. In this paper, we tackle this problem by applying quantization to reinforcement learning. To that end, we introduce a novel Reinforcement Learning (RL) training paradigm, \textit{ActorQ}, to speed up actor-learner distributed RL training. \textit{ActorQ} leverages 8-bit quantized actors to speed up data collection without affecting learning convergence. Our quantized distributed RL training system, \textit{ActorQ}, demonstrates end-to-end speedups of 1.5 - 2.5 , and faster convergence over full precision training on a range of tasks (Deepmind Control Suite) and different RL algorithms (D4PG, DQN). Furthermore, we compare the carbon emissions (Kgs of CO2) of \textit{ActorQ} versus standard reinforcement learning on various tasks. Across various settings, we show that \textit{ActorQ} enables more environmentally friendly reinforcement learning by achieving 2.8 less carbon emission and energy compared to training RL-agents in full-precision. Finally, we demonstrate empirically that aggressively quantized RL-policies (up to 4/5 bits) enable significant speedups on quantization-friendly (supports native quantization) resource-constrained edge devices, without degrading accuracy. We believe that this is the first of many future works on enabling computationally energy-efficient and sustainable reinforcement learning. The source code for QuaRL is available here for the public to use: \url{https://bit.ly/quarl-tmlr}. View details
    Preview abstract We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. Existing generators suffer from poor visual grounding, causing them to rely on language priors and hallucinate objects. Our MARKY-MT5 system addresses this by focusing on visual landmarks; it comprises a first stage landmark detector and a second stage generator -- a multimodal, multilingual, multitask encoder-decoder. To train it, we bootstrap grounded landmark annotations on top of the Room-across-Room (RxR) dataset. Using text parsers, weak supervision from RxR's pose traces, and a multilingual image-text encoder trained on 1.8b images, we identify 1.1m English, Hindi and Telugu landmark descriptions and ground them to specific regions in panoramas. On Room-to-Room, human wayfinders obtain success rates (SR) of 71% following MARKY-MT5's instructions, just shy of their 75% SR following human instructions -- and well above SRs with other generators. Evaluations on RxR's longer, diverse paths obtain 61-64% SRs on three languages. Generating such high-quality navigation instructions in novel environments is a step towards conversational navigation tools and could facilitate larger-scale training of instruction-following agents. View details
    Multi-Task Learning with Sequence-Conditioned Transporter Networks
    Michael Lim
    Andy Zeng
    Brian Andrew Ichter
    Maryam Bandari
    Erwin Johan Coumans
    Claire Tomlin
    Stefan Schaal
    International Conference on Robotics and Automation 2022, IEEE (to appear)
    Preview abstract Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pick-and-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks. View details
    Roofline Model for UAVs: A Bottleneck Analysis Tool for Onboard Compute Characterization of Autonomous Unmanned Aerial Vehicles
    Srivatsan Krishnan
    Zishen Wan
    Kshitij Bhardwaj
    Ninad Jadhav
    Vijay Janapa Reddi
    IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (2022)
    Preview abstract We introduce an early-phase bottleneck analysis and characterization model called the F-1 for designing computing systems that target autonomous Unmanned Aerial Vehicles (UAVs). The model provides insights by exploiting the fundamental relationships between various components in the autonomous UAV, such as sensor, compute, and body dynamics. To guarantee safe operation while maximizing the performance (e.g., velocity) of the UAV, the compute, sensor, and other mechanical properties must be carefully selected or designed. The F-1 model provides visual insights that can aid a system architect in understanding the optimal compute design or selection for autonomous UAVs. The model is experimentally validated using real UAVs, and the error is between 5.1\% to 9.5\% compared to real-world flight tests. An interactive web-based tool for the F-1 model called Skyline is available for free of cost use at: https://bit.ly/skyline-tool View details
    The Role of Compute in Autonomous Micro Aerial Vehicles: Optimizing for Flight Time and Energy Efficiency
    Behzad Boroujerdian
    Hasan Genc
    Srivatsan Krishnan
    Bardienus Pieter Duisterhof
    Brian Plancher
    Kayvan Mansoorshahi
    Marcelino Almeida
    Wenzhi Cui
    Vijay Janapa Reddi
    ACM Transactions on Computer Systems (TOCS) (2022) (to appear)
    Preview abstract Autonomous and mobile cyber-physical machines are becoming an inevitable part of our future. In particular, Micro Aerial Vehicles (MAVs) have seen a resurgence in activity. With multiple use cases, such as surveillance, search and rescue, package delivery, and more, these unmanned aerial systems are on the cusp of demonstrating their full potential. Despite such promises, these systems face many challenges, one of the most prominent of which is their low endurance caused by their limited onboard energy. Since the success of a mission depends on whether the drone can finish it within such duration and before it runs out of battery, improving both the time and energy associated with the mission are of high importance. Such improvements have traditionally arrived at through the use of better algorithms. But our premise is that more powerful and efficient onboard compute can also address the problem. In this paper, we investigate how the compute subsystem, in a cyber-physical mobile machine, such as a Micro Aerial Vehicle , can impact mission time (time to complete a mission) and energy. Specifically, we pose the question as “what is the role of computing for cyber-physical mobile robots?” We show that compute and motion are tightly intertwined, and as such a close examination of cyber and physical processes and their impact on one another is necessary. We show different “impact paths” through which compute impacts mission metrics and examine them using a combination of analytical models, simulation, micro and end-to-end benchmarking. To enable similar studies, we open sourced MAVBench, our tool-set, which consists of (1) a closed-loop real-time feedback simulator and (2) an end-to-end benchmark suite comprised of state-of-the-art kernels. By combining MAVBench, analytical modeling, and an understanding of various compute impacts, we show up to 2X and 1.8X improvements for mission time and mission energy for two optimization case studies. Our investigations, as well as our optimizations, show that cyber-physical co-design, a methodology with which both the cyber and physical processes/quantities of the robot are developed with consideration of one another, similar to hardware-software co-design, is necessary for arriving at the design of the optimal robot. View details