Google Research

Wiki-Conciseness Dataset


Conciseness is a writing principle that aims to remove redundancy from text. Concise re-writing can substantially improve readability of documents. Despite its importance, this topic is not well-studied in natural language processing.

This is a manually curated evaluation set in English for concise rewrites of 2,000 Wikipedia sentences. Concise-Lite (2-way annotated) annotators were asked to make minimal changes to the original sentence, whereas Concise-Full (5-way annotated) annotators were given the option to make larger rewrites.