Answer-Me: Multi-Task Open-Vocabulary Learning for Visual Question-Answering

AJ Piergiovanni; Wei Li; Weicheng Kuo; Mohammad Taghi Saffar; Fred Bertsch; Anelia Angelova

Answer-Me: Multi-Task Open-Vocabulary Learning for Visual Question-Answering

AJ Piergiovanni

Wei Li

Weicheng Kuo

Mohammad Taghi Saffar

Fred Bertsch

Anelia Angelova

CVPR Workshop (2022)

Download Google Scholar

Abstract

We present Answer-Me, a task-aware multi-task framework which unifies multiple question answering tasks, such as, visual question answering, visual entailment, visual reasoning. In contrast to previous works using contrastive or generative captioning training, we propose a novel and simple recipe to pretrain a vision-language joint model, which is multi-task as well, and uses the entire architecture end-to-end. Our results, which are in the challenging open-vocabulary generative setting, show state-of-the-art performance, zero-shot generalization, robustness to forgetting.

Research Areas

Machine perception

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Answer-Me: Multi-Task Open-Vocabulary Learning for Visual Question-Answering

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs