A Comparison of Features for Automatic Readability Assessment

Lijun Feng; Martin Jansche; Matt Huenerfauth; Noémie Elhadad

A Comparison of Features for Automatic Readability Assessment

Lijun Feng

Martin Jansche

Matt Huenerfauth

Noémie Elhadad

23rd International Conference on Computational Linguistics (COLING 2010), Poster Volume, pp. 276-284

Download Google Scholar

Abstract

Several sets of explanatory variables – including shallow, language modeling, POS, syntactic, and discourse features – are compared and evaluated in terms of their impact on predicting the grade level of reading material for primary school students. We find that features based on in-domain language models have the highest predictive power. Entity-density (a discourse feature) and POS-features, in particular nouns, are individually very useful but highly correlated. Average sentence length (a shallow feature) is more useful – and less expensive to compute – than individual syntactic features. A judicious combination of features examined here results in a significant improvement over the state of the art.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

A Comparison of Features for Automatic Readability Assessment

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs