Google Research



Sentence fusion is the task of joining several independent sentences into a single coherent text. Current datasets for sentence fusion are small and insufficient for training modern neural models.

We have created a set of rules for identifying a diverse set of discourse phenomena in raw text for decomposing the text into two independent sentences.

The approach has been applied on two document collections: Wikipedia and Sports articles, yielding 60 million fusion examples annotated with discourse information required to reconstruct the fused text.