- Alex Irpan
- Alexander Herzog
- Alexander Toshkov Toshev
- Andy Zeng
- Anthony Brohan
- Brian Andrew Ichter
- Byron David
- Carolina Parada
- Chelsea Finn
- Clayton Tan
- Diego Reyes
- Dmitry Kalashnikov
- Eric Victor Jang
- Fei Xia
- Jarek Liam Rettinghouse
- Jasmine Chiehju Hsu
- Jornell Lacanlale Quiambao
- Julian Ibarz
- Kanishka Rao
- Karol Hausman
- Keerthana Gopalakrishnan
- Kuang-Huei Lee
- Kyle Alan Jeffrey
- Linda Luu
- Mengyuan Yan
- Michael Soogil Ahn
- Nicolas Sievers
- Nikhil J Joshi
- Noah Brown
- Omar Eduardo Escareno Cortes
- Peng Xu
- Peter Pastor Sampedro
- Pierre Sermanet
- Rosario Jauregui Ruano
- Ryan Christopher Julian
- Sally Augusta Jesmonth
- Sergey Levine
- Steve Xu
- Ted Xiao
- Vincent Olivier Vanhoucke
- Yao Lu
- Yevgen Chebotar
- Yuheng Kuang
Abstract
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of executing long-horizon, abstract, natural-language tasks on a mobile manipulator. The project's website and the video can be found at \url{say-can.github.io}.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work