Google Research

Following Formulaic Map Instructions in a Street Simulation Environment

Visually Grounded Interaction and Language Workshop (ViGIL) (2018)


We introduce a task and a learning environment for following navigational instructions in Google Street View. We sample ∼100k routes in 100 regions in 10 U.S cities. For each route, we obtain navigation instructions, build a connected graph of locations and the real-world images available at each location, and extract visual features. Evaluation of existing models shows that this setting offers a challenging benchmark for agents navigating with the help of language cues in real-world outdoor locations. They also highlight the need to have start-of-path orientation descriptions and end-of-path goal descriptions as well as route descriptions.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work