Google Research

Source-summary Entity Aggregation in Abstractive Summarization

COLING 2022 (to appear)


In a discourse, specific entities that are mentioned can later be referred to by a more general description. For example, 'Celine Dion' and 'Justin Bieber' can be referred to by 'Canadian singers' or 'celebrities'. In this work, we study this phenomenon in the context of summarization, where entities drawn from a source text are generalized in the summary. We call such instances 'source-summary entity aggregations'. We categorize and study several types of source-summary entity aggregations in the CNN/Dailymail corpus, showing that they are reasonably frequent. We experimentally analyze the capabilities of three state-of-the-art summarization systems for generating such aggregations within summaries. We also explore how they can be encouraged to generate more aggregations. Our results show that there is significant room for improvement in generating semantically correct and appropriate aggregations.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work