Context-aware Captions from Context-agnostic Supervision

Shanmukha Ramakrishna Vedantam

Samy Bengio

Kevin Murphy

Devi Parikh

Gal Chechik

CVPR(2017)

Download Google Scholar

Abstract

We describe a model to induce discriminative image captions based only on generative ground-truth training data. For example, given images and descriptions of “zebras” and “horses”, our system can generate discriminative language that describes the zebra images while capturing the differences with the “horse” images . Producing discriminative language is a foundational problem in the study of pragmatic behavior: Humans can effortlessly repurpose language for being persuasive and effective in communication. We first propose a novel inference procedure based on a reflex speaker and an introspector to induce discrimination between concepts. Intuitively, the reflex speaker models a good utterance for some concept (“zebra”), while the introspector models how discriminative the sentence is between the concepts (“zebra” and “horse”). Unlike previous approaches, the form of our listener has the attractive property of being amenable to joint approximate inference to select utterances that satisfy both the speaker and the introspector, yielding an introspective speaker. We apply our introspective speaker to the CUB-Text dataset to describe why an image contains a particular bird category as opposed to some other closely related bird category and to the MS COCO dataset to generate language that points to one out two semantically similar images. Evaluations with discriminative ground truth collected on CUB and with humans on MSCOCO reveal that our approach outperforms baseline approaches for discrimination. We then draw qualitative insights from our model outputs which suggest that in some cases one may interpret the introspective speaker outputs to be lies in service of the higher goal of discrimination.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Context-aware Captions from Context-agnostic Supervision

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Context-aware Captions from Context-agnostic Supervision

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities