Decoupled Novel Object Captioner

Yu Wu; Linchao Zhu; Lu Jiang; Yi Yang

Decoupled Novel Object Captioner

Yu Wu

Linchao Zhu

Lu Jiang

Yi Yang

ACM Multimedia (2018)

Download Google Scholar

Abstract

Image captioning is a challenging task where the machine automatically describes an image by sentences or phrases. It often requires a large number of paired image-sentence annotations for training. However, a pre-trained captioning model can hardly be applied to a new domain in which some novel object categories exist, i.e., the objects and their description words are unseen during model training. To correctly caption the novel object, it requires professional human workers to annotate the images by sentences with the novel words. It is labor expensive and thus limits its usage in real-world applications.

In this paper, we introduce the zero-shot novel object captioning task where the machine generates descriptions without extra training sentences about the novel object. To tackle the challenging problem, we propose a Decoupled Novel Object Captioner (DNOC) framework that can fully decouple the language sequence model from the object descriptions. DNOC has two components. 1) A Sequence Model with the Placeholder (SM-P) generates a sentence containing placeholders. The placeholder represents an unseen novel object. Thus, the sequence model can be decoupled from the novel object descriptions. 2) A key-value object memory built upon the freely available detection model, contains the visual
information and the corresponding word for each object. A query generated from the SM-P is used to retrieve the words from the object memory. The placeholder will further be filled with the correct word, resulting in a caption with novel object descriptions. The experimental results on the held-out MSCOCO dataset demonstrate the ability of DNOC in describing novel concepts.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Decoupled Novel Object Captioner

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs