Jump to Content

A Dataset and Architecture for Visual Reasoning with a Working Memory

Robert Guangyu Yang
Igor Ganichev
Xiao Jing Wang
Jonathon Shlens
David Sussillo
ECCV (2018)


A vexing problem in artificial intelligence is reasoning about events that occur in complex, changing visual stimuli, such as in video analysis or game play. Inspired by cognitive psychology and neuroscience, which have a rich tradition of studying both visual reasoning and memory, we developed a configurable visual question and answer dataset (COG) that is much simpler than the general problem of video analysis yet addresses many of the problems relating to visual and logical reasoning and memory, problems that remain challenging for modern deep learning architectures. We additionally propose a deep learning architecture that performs at state of the art level on the CLEVR dataset, and performs well on easy settings of the COG dataset, but struggles at harder levels. Preliminary analyses of the network demonstrate the network accomplishes the task in ways that are interpretable to humans.