I am a research scientist in Google Research, working predominantly at the intersection of computer vision and natural language processing. Previously I was a research scientist at Georgia Tech. My most recent work has focused on embodied vision-and-language agents operating in complex 3D environments. My research interests also include image captioning, visual question answering (VQA), and multimodal AI tasks in general. I completed my PhD at the Australian National University in 2018. See also my personal webpage for more information.