Google Research

Multimodal Storytelling via Generative Adversarial Imitation Learning

The Twenty-Sixth International Joint Conference on Artificial Intelligence (2017), pp. 3967-3973

Abstract

Deriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumptions about users’ interests. These works can extract interesting patterns, but their assumptions do not guarantee that the derived patterns will match users’ preference. On the other hand, their exclusiveness of single modality source misses cross-modality information. This paper proposes a method, multimodal imitation learning via Generative Adversarial Networks(MIL-GAN), to directly model users’ interests as reflected by various data. In particular, the proposed model addresses the critical challenge by imitating users’ demonstrated storylines. Our proposed model is designed to learn the reward patterns given user-provided storylines and then applies the learned policy to unseen data. The proposed approach is demonstrated to be capable of acquiring the user’s implicit intent and outperforming competing methods by a substantial margin with a user study.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work