Jump to Content

Natural Language Grounded Multitask Navigation

Xin Wang
Vihan Jain
William Wang
Zornitsa Kozareva
Sujith Ravi
NeurIPS Visually Grounded Interaction and Language (ViGIL) (2019)
Google Scholar

Abstract

Recent research efforts enable the study of natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, data scarcity is a critical issue in these tasks, as conducting human demonstrated language interactions in the simulator is still expensive and time-consuming and it is impractical to exhaustively collect samples for all variants of the navigation tasks. Therefore, we introduce a generalized multitask navigation model that can seamlessly be trained on language-grounded navigation tasks such as Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH). Benefiting from richer natural language guidance, the multitask model can efficiently transfer knowledge across related tasks. Experiments show that it outperforms the single-task model by 7% (success rate) on VLN and 61% (goal progress) on NDH, establishing the new state of the art for NDH.