CPL: Counterfactual Prompt Learning for Vision and Language Models

Xuehai He; Diji Yang; Weixi Feng; Tsu-Jui Fu; Arjun R. Akula; Varun Jampani; Pradyumna Narayana; Sugato Basu; William Yang Wang; Xin Eric Wang

CPL: Counterfactual Prompt Learning for Vision and Language Models

Xuehai He

Diji Yang

Weixi Feng

Tsu-Jui Fu

Arjun R. Akula

Varun Jampani

Pradyumna Narayana

Sugato Basu

William Yang Wang

Xin Eric Wang

Conference on Empirical Methods in Natural Language Processing (EMNLP) (2022)

Google Scholar

Abstract

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, which leads to poor generalization to unseen concepts.
Towards non-spurious and efficient prompt learning from limited examples, this paper presents a novel Counterfactual Prompt Learning (CPL) method for vision and language models, which simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework.
Particularly, CPL constructs counterfactual by identifying minimal non-spurious feature change between semantically-similar positive and negative samples that causes concept change, and learns more generalizable prompt representation from both factual and counterfactual examples via contrastive learning. Extensive experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks than previous prompt tuning methods on CLIP. On image classification, we achieve 3.55% average relative improvement on unseen classes across seven datasets; on image-text retrieval and visual question answering, we gain up to 4.09% and 25.08% relative improvements across three few-shot scenarios on unseen test sets respectively.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

CPL: Counterfactual Prompt Learning for Vision and Language Models

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs