Following Formulaic Map Instructions in a Street Simulation Environment

Volkan Cirik
Yuan Zhang
Visually Grounded Interaction and Language Workshop (ViGIL)(2018)
Google Scholar


We introduce a task and a learning environment for following navigational instructions in Google Street View. We sample ∼100k routes in 100 regions in 10 U.S cities. For each route, we obtain navigation instructions, build a connected graph of locations and the real-world images available at each location, and extract visual features. Evaluation of existing models shows that this setting offers a challenging benchmark for agents navigating with the help of language cues in real-world outdoor locations. They also highlight the need to have start-of-path orientation descriptions and end-of-path goal descriptions as well as route descriptions.