Reinforcing an Image Caption Generator using Off-line Human Feedback

Paul Hongsuck Seo

Piyush Sharma

Tomer Levinboim

Radu Soricut

AAAI 2020 (2020)

Google Scholar

Abstract

Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset. In this paper, we show that the signal from instance-level human caption ratings can be leveraged to achieve improved captioning models, even when the amount of caption ratings is several orders of magnitude less than the caption training data. We employ a policy gradient method to maximize the human ratings as rewards in an off-policy reinforcement learning setting, using a technique that makes use of a sampling distribution that focuses on the captions that are present in a caption-ratings dataset. We present empirical evidence that indicates that our models learn to generalize the human raters’judgments in the caption-ratings training data to a previously unseen set of images, as judged by a different set of human judges and additionally on a different, multi-dimensional side-by-side human evaluation procedure.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reinforcing an Image Caption Generator using Off-line Human Feedback

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reinforcing an Image Caption Generator using Off-line Human Feedback

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities