Google Research

On Faithfulness and Factuality in Abstractive Summarization

Proceedings of The 58th Annual Meeting of the Association for Computational Linguistics (ACL) (2020)

Abstract

It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models are fundamentally flawed and lead to dull and repetitive responses. We found that these models when tested on abstractive summarization are highly prone to hallucinate content that is either unfaithful to the input document, completely irrelevant or gibberish. We conduct a large scale human evaluation of several state of the art neural abstractive summarization systems including pretrained language models to better understand the types of hallucinations. Furthermore, we study the extent to which the hallucinated content (i) co-occurs with the common linguistic irregularities such as repetition and incoherence, and (ii) can be measured by NLU measures such as textual entailment, question answering and OpenIE-based fact checking.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work