Description
Sentence fusion is the task of joining several independent sentences into a single coherent text. Current datasets for sentence fusion are small and insufficient for training modern neural models.
We have created a set of rules for identifying a diverse set of discourse phenomena in raw text for decomposing the text into two independent sentences.
The approach has been applied on two document collections: Wikipedia and Sports articles, yielding 60 million fusion examples annotated with discourse information required to reconstruct the fused text.