Answer-Me: Multi-Task Open-Vocabulary Learning for Visual Question-Answering

AJ Piergiovanni

Wei Li

Weicheng Kuo

Mohammad Taghi Saffar

Fred Bertsch

Anelia Angelova

CVPR Workshop (2022)

Download Google Scholar

Abstract

We present Answer-Me, a task-aware multi-task framework which unifies multiple question answering tasks, such as, visual question answering, visual entailment, visual reasoning. In contrast to previous works using contrastive or generative captioning training, we propose a novel and simple recipe to pretrain a vision-language joint model, which is multi-task as well, and uses the entire architecture end-to-end. Our results, which are in the challenging open-vocabulary generative setting, show state-of-the-art performance, zero-shot generalization, robustness to forgetting.

Research Areas

Machine Perception

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Answer-Me: Multi-Task Open-Vocabulary Learning for Visual Question-Answering

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Answer-Me: Multi-Task Open-Vocabulary Learning for Visual Question-Answering

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities