Google Research

Natural Language Grounded Multitask Navigation

NeurIPS Visually Grounded Interaction and Language (ViGIL) (2019)

Abstract

Recent research efforts enable the study of natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, data scarcity is a critical issue in these tasks, as conducting human demonstrated language interactions in the simulator is still expensive and time-consuming and it is impractical to exhaustively collect samples for all variants of the navigation tasks. Therefore, we introduce a generalized multitask navigation model that can seamlessly be trained on language-grounded navigation tasks such as Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH). Benefiting from richer natural language guidance, the multitask model can efficiently transfer knowledge across related tasks. Experiments show that it outperforms the single-task model by 7% (success rate) on VLN and 61% (goal progress) on NDH, establishing the new state of the art for NDH.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work