PEGASUS: Pretraining with Extracted Gap-sentences for Abstractive Summarization by Sequence-to-sequence Models

Jingqing Zhang; Yao Zhao; Mohammad Ahmad Saleh; Peter J. Liu

PEGASUS: Pretraining with Extracted Gap-sentences for Abstractive Summarization by Sequence-to-sequence Models

Jingqing Zhang

Yao Zhao

Mohammad Ahmad Saleh

Peter J. Liu

(2020)

Google Scholar

Abstract

Previous development of abstractive summarization was constrained by the demand of large scale high-quality supervised summarization datasets. Recent works on the Transformer model and pretraining techniques have shown great success in various NLP tasks including text summarization. However, none of those works has explored pretraining techniques tailored specifically for abstractive text summarization; furthermore, there is a lack of systematic evaluation on abstractive summarization in broad domains. In this work, we propose Pretraining using Extracted Gap-sentences for Abstractive SUmmarization by Sequence-to-sequence models (PEGASUS). In other words, we propose extractive strategies to select and mask principal sentences and the sequence-to-sequence model is pretrained to generate the masked sentences. We evaluate PEGASUS on 12 downstream summarization datasets spanning news, science, technology, medical, social networking, instructions, cooperate emails and legal domains. Experiments demonstrate PEGASUS achieves state-of-the-art performance on all 12 downstream summarization datasets measured by ROUGE scores. PEGASUS also shows surprising capability on low resource settings, achieving SOTA or near-SOTA results on x out of 12 tasks using only 100 finetuning examples.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

PEGASUS: Pretraining with Extracted Gap-sentences for Abstractive Summarization by Sequence-to-sequence Models

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs