Google Research

A Dataset and Architecture for Visual Reasoning with a Working Memory

  • Robert Guangyu Yang
  • Igor Ganichev
  • Xiao Jing Wang
  • Jonathon Shlens
  • David Sussillo
ECCV (2018)


A vexing problem in artificial intelligence is reasoning about events that occur in complex, changing visual stimuli, such as in video analysis or game play. Inspired by cognitive psychology and neuroscience, which have a rich tradition of studying both visual reasoning and memory, we developed a configurable visual question and answer dataset (COG) that is much simpler than the general problem of video analysis yet addresses many of the problems relating to visual and logical reasoning and memory, problems that remain challenging for modern deep learning architectures. We additionally propose a deep learning architecture that performs at state of the art level on the CLEVR dataset, and performs well on easy settings of the COG dataset, but struggles at harder levels. Preliminary analyses of the network demonstrate the network accomplishes the task in ways that are interpretable to humans.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work