Skip to main content

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Research

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Resources

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Conferences & events

Careers

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Blog

About

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Google Research

Learn about all our AI

Google DeepMind

Explore the frontier of AI

Try our AI experiments

Conferences & events

Blog

Kanishka Rao

Home
People

Kanishka Rao

Research Areas

Machine intelligence

Authored Publications

results

Filter by:

Publications

Google 23
Other 0

Years

2023 1
2022 1
2020 1
2019 3
2018 3
2017 6
2016 3
2015 5

Research Areas

Data Mining and Modeling 1
Distributed Systems and Parallel Computing 3
Machine Intelligence 9
Robotics 2
Speech Processing 17

Teams

Athena 1
Language 2
Perception 2

Sort By

Title
Title, descending
Year
Year, descending

chip template

Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators

Jarek Rettinghouse

Daniel Ho

Julian Ibarz

Yunfei Bai

Kuang-Huei Lee

Ted Xiao

Sangeetha Ramesh

Matt Bennice

Alexander Herzog

Chuyuan Kelly Fu

Adrian Li

Kim Kleiven

Keerthana Gopalakrishnan

Jeff Bingham

Yevgen Chebotar

David Rendleman

Wenlong Lu

Mohi Khansari

Mrinal Kalakrishnan

Ying Xu

Sean Kirmani

Noah Brown

Khem Holden

Kanishka Rao

Justin Vincent

Ryan Julian

Peter Pastor Sampedro

Jessica Lin

David Dovo

Montse Gonzalez Arenas

Daniel Kappler

Mengyuan Yan

Sergey Levine

Jessica Lam

Jonathan Weisz

Paul Wohlhart

Karol Hausman

Cameron Lee

Bob Wei

Yao Lu

2023

Preview abstract We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects. View details

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Alex Irpan

Alexander Herzog

Alexander Toshkov Toshev

Andy Zeng

Anthony Brohan

Brian Andrew Ichter

Byron David

Carolina Parada

Chelsea Finn

Clayton Tan

Diego Reyes

Dmitry Kalashnikov

Eric Victor Jang

Fei Xia

Jarek Liam Rettinghouse

Jasmine Chiehju Hsu

Jornell Lacanlale Quiambao

Julian Ibarz

Kanishka Rao

Karol Hausman

Keerthana Gopalakrishnan

Kuang-Huei Lee

Kyle Alan Jeffrey

Linda Luu

Mengyuan Yan

Michael Soogil Ahn

Nicolas Sievers

Nikhil J Joshi

Noah Brown

Omar Eduardo Escareno Cortes

Peng Xu

Peter Pastor Sampedro

Pierre Sermanet

Rosario Jauregui Ruano

Ryan Christopher Julian

Sally Augusta Jesmonth

Sergey Levine

Steve Xu

Ted Xiao

Vincent Olivier Vanhoucke

Yao Lu

Yevgen Chebotar

Yuheng Kuang

Conference on Robot Learning (CoRL) (2022)

Preview abstract Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of executing long-horizon, abstract, natural-language tasks on a mobile manipulator. The project's website and the video can be found at \url{say-can.github.io}. View details

RL-CycleGAN: Improving Deep-RL Robotics With Simulation-To-Real

Alex Irpan

Chris Harris

Julian Ibarz

Kanishka Rao

Mohi Khansari

Sergey Levine

CVPR 2020 (2020)

Preview abstract Robots trained via reinforcement-learning (RL) requirecollecting and labeling many real-world episodes, whichmay be costly and time-consuming. Training models with alarge amount of simulation is a cheaper alternative. How-ever, simulations are not perfect and such models may nottransfer to the real world. Techniques developed to closethis simulation-to-reality (Sim2Real) gap typically applyrandomization to the simulated images or adapt them withan additional Sim2Real model. A Generative Adversar-ial network (GAN) may be used to adapt the pixels of thesimulated image to be more realistic before use by a deepRL model. We find the CycleGAN which enforces a cycleconsistency between Sim2Real and Real2Sim adaptationsproduces better images for RL than a GAN alone. Ulti-mately, we develop RL-CycleGAN which includes a Cycle-GAN which trains jointly with the deep RL model and en-forces that the RL model is consistent across all the adap-tations.We evaluate the RL-CycleGAN on two vision-based robotics grasping tasks and compare it to previoustechniques. With 580,000 real episodes and millions ofsimulated episodes adapted with RL-CycleGAN achievesxx% grasp success, while a previous GAN-based approach,GraspGAN, achieves xx% grasp success. With only 5,000real episodes, RL-CycleGAN and GraspGAN achieve xx%and xx% grasp success respectively. On a multi-bin grasp-ing task, we show RL-CycleGAN drastically improves dataefficiency requiring 1/xth the amount of real data to reachthe same grasping performance. View details

Federated Learning for Emoji Prediction in a Mobile Keyboard

Francoise Beaufays

Kanishka Rao

Rajiv Mathews

Swaroop Ramaswamy

(2019)

Preview abstract We show that a word-level recurrent neural network can predict emoji from text typed on a mobile keyboard. We demonstrate the usefulness of transfer learning for predicting emoji by pretraining the model using a language modeling task. We also propose mechanisms to trigger emoji and tune the diversity of candidates. The model is trained using a distributed on-device learning framework called federated learning. The federated model is shown to achieve better performance than a server-trained model. This work demonstrates the feasibility of using federated learning to train production-quality models for natural language understanding tasks while keeping users' data on their devices. View details

Federated Learning for Mobile Keyboard Prediction

Andrew Hard

Chloé M Kiddon

Daniel R Ramage

Françoise Simone Beaufays

Hubert Eichner

Kanishka Rao

Rajiv Mathews

Sean Augenstein

Swaroop Ramaswamy

(2019)

Preview abstract We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices. View details

STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES

Yanzhang He

Tara Sainath

Rohit Prabhavalkar

Ian McGraw

Raziel Alvarez

Ding Zhao

David Rybach

Anjuli Kannan

Yonghui Wu

Ruoming Pang

Qiao Liang

Deepti Bhatia

Yuan Shangguan

Bo Li

Golan Pundak

Khe Chai Sim

Tom Bagby

Shuo-yiin Chang

Kanishka Rao

Alex Gruenstein

ICASSP (2019)

Preview abstract End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recognizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories. View details

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Chung-Cheng Chiu

Tara Sainath

Yonghui Wu

Rohit Prabhavalkar

Patrick Nguyen

Zhifeng Chen

Anjuli Kannan

Ron J. Weiss

Kanishka Rao

Katya Gonina

Navdeep Jaitly

Bo Li

Jan Chorowski

Michiel Bacchiani

ICASSP (2018) (to appear)

Preview abstract Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In our previous work, we have shown that such architectures are comparable to state-of-the-art ASR systems on dictation tasks, but it was not clear if such architectures would be practical for more challenging tasks such as voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural side, we show that word piece models can be used instead of graphemes. We introduce a multi-head attention architecture, which offers improvements over the commonly-used single-head attention. On the optimization side, we explore techniques such as synchronous training, scheduled sampling, label smoothing, and minimum word error rate optimization, which are all shown to improve accuracy. We present results with a unidirectional LSTM encoder for streaming recognition. On a 12,500 hour voice search task, we find that the proposed changes improve the WER of the LAS system from 9.2% to 5.6%, while the best conventional system achieve 6.7% WER. We also test both models on a dictation dataset, and our model provide 4.1% WER while the conventional system provides 5% WER. View details

Multilingual Speech Recognition with a Single End-to-End Model

Shubham Toshniwal

Tara N. Sainath

Ron Weiss

Bo Li

Pedro Moreno

Eugene Weinsten

Kanishka Rao

ICASSP (2018)

Preview abstract Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages. View details

Federated Learning for Mobile Keyboard Prediction

Andrew Hard

Chloé M Kiddon

Daniel Ramage

Francoise Beaufays

Hubert Eichner

Kanishka Rao

Rajiv Mathews

Sean Augenstein

(2018)

Preview abstract We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the FederatedAveraging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices. View details

Multi-Accent Speech Recognition with Hierarchical Grapheme Based Models

Hasim Sak

Kanishka Rao

ICASSP 2017 (to appear)

Preview abstract We explore the viability of grapheme-based recognition specifically how it compares to phoneme-based equivalents. We utilize the CTC loss to train models to directly predict graphemes, we also train models with hierarchical CTC and show that they improve on previous CTC models. We also explore how the grapheme and phoneme models scale with large data sets, we consider a single acoustic training data set where we combine various dialects of English from US, UK, India and Australia. We show that by training a single grapheme-based model on this multi-dialect data set we create a accent-robust ASR system View details

1
2

of 3

of 3 pages

Search on Google Scholar

Join us

We're always looking for more talented, passionate people.

See opportunities

Follow us

Explore our other initiatives

Google AI

Discover how Google AI is committed to enriching knowledge and solving complex challenges

Products
Build
Research
Responsibility
Societal Impact
About

Google Cloud

High-performance infrastructure for cloud computing, data analytics & machine learning

Overview
Solutions
Products
Pricing
Resources

Google DeepMind

Our mission is to build AI responsibly to benefit humanity

Models
Research
Science
About

Google Labs

Explore the future of AI responsibly with Google Labs

About
Experiments
Stay connected

Google Products

×