Montse Gonzalez Arenas
Montserrat currently is a Senior Research Engineer working at Google Brain, her research interest involve natural language processing techniques such as speech recognition. She also has developed projects about data analysis and statistical modeling. She is currently working at Google Brain Robotics.
Research Areas
Authored Publications
Sort By
Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators
Jarek Rettinghouse
Daniel Ho
Julian Ibarz
Sangeetha Ramesh
Matt Bennice
Alexander Herzog
Chuyuan Kelly Fu
Adrian Li
Kim Kleiven
Jeff Bingham
Yevgen Chebotar
David Rendleman
Wenlong Lu
Mohi Khansari
Mrinal Kalakrishnan
Ying Xu
Noah Brown
Khem Holden
Justin Vincent
Peter Pastor Sampedro
Jessica Lin
David Dovo
Daniel Kappler
Mengyuan Yan
Sergey Levine
Jessica Lam
Jonathan Weisz
Paul Wohlhart
Karol Hausman
Cameron Lee
Bob Wei
Yao Lu
Preview abstract
We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects.
View details
Robotic table wiping via whole-body trajectory optimizationand reinforcement learning
Benjie Holson
Jeffrey Bingham
Jonathan Weisz
Mario Prats
Peng Xu
Thomas Lew
Xiaohan Zhang
Yao Lu
ICRA (2022)
Preview abstract
We propose an end-to-end framework to enablemultipurpose assistive mobile robots to autonomously wipetables and clean spills and crumbs. This problem is chal-lenging, as it requires planning wiping actions with uncertainlatent crumbs and spill dynamics over high-dimensional visualobservations, while simultaneously guaranteeing constraintssatisfaction to enable deployment in unstructured environments.To tackle this problem, we first propose a stochastic differentialequation (SDE) to model crumbs and spill dynamics and ab-sorption with the robot wiper. Then, we formulate a stochasticoptimal control for planning wiping actions over visual obser-vations, which we solve using reinforcement learning (RL). Wethen propose a whole-body trajectory optimization formulationto compute joint trajectories to execute wiping actions whileguaranteeing constraints satisfaction. We extensively validateour table wiping approach in simulation and on hardware.
View details
Personalized Speech Recognition On Mobile Devices
Raziel Alvarez
David Rybach
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Preview abstract
We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.
View details