Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Alex Irpan

Alexander Herzog

Alexander Toshkov Toshev

Andy Zeng

Anthony Brohan

Brian Andrew Ichter

Byron David

Carolina Parada

Chelsea Finn

Clayton Tan

Diego Reyes

Dmitry Kalashnikov

Eric Victor Jang

Fei Xia

Jarek Liam Rettinghouse

Jasmine Chiehju Hsu

Jornell Lacanlale Quiambao

Julian Ibarz

Kanishka Rao

Karol Hausman

Keerthana Gopalakrishnan

Kuang-Huei Lee

Kyle Alan Jeffrey

Linda Luu

Mengyuan Yan

Michael Soogil Ahn

Nicolas Sievers

Nikhil J Joshi

Noah Brown

Omar Eduardo Escareno Cortes

Peng Xu

Peter Pastor Sampedro

Pierre Sermanet

Rosario Jauregui Ruano

Ryan Christopher Julian

Sally Augusta Jesmonth

Sergey Levine

Steve Xu

Ted Xiao

Vincent Olivier Vanhoucke

Yao Lu

Yevgen Chebotar

Yuheng Kuang

Conference on Robot Learning (CoRL) (2022)

Download Google Scholar

Abstract

Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language.
However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context.
For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment.
We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate.
The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task.
We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment.
We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of executing long-horizon, abstract, natural-language tasks on a mobile manipulator.
The project's website and the video can be found at \url{say-can.github.io}.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Abstract

Research Areas

Meet the teams driving innovation