- AJ Piergiovanni
- Wei Li
- Weicheng Kuo
- Mohammad Taghi Saffar
- Fred Bertsch
- Anelia Angelova
CVPR Workshop (2022)
We present Answer-Me, a task-aware multi-task framework which unifies multiple question answering tasks, such as, visual question answering, visual entailment, visual reasoning. In contrast to previous works using contrastive or generative captioning training, we propose a novel and simple recipe to pretrain a vision-language joint model, which is multi-task as well, and uses the entire architecture end-to-end. Our results, which are in the challenging open-vocabulary generative setting, show state-of-the-art performance, zero-shot generalization, robustness to forgetting.
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work