Data Strategies for Low-Resource Grammatical Error Correction

Simon Flachs

Felix Stahlberg

Shankar Kumar

Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, ACL, https://sig-edu.org/bea/current (2021)

Download Google Scholar

Abstract

Grammatical Error Correction (GEC) is a task that has been extensively investigated for the English language. However for other low-resource languages the best practices for training GEC systems have not yet been systematically determined. We investigate how best to take advantage of existing data sources for improving GEC systems for languages with limited quantities of high quality training data. In particular, we compare methods for generating artificial error data to train GEC systems, and show that these methods can benefit from including morphological errors. We then look into the usefulness of noisy error correction data gathered from Wikipedia and the language learning website Lang8, and demonstrate that despite their inherent noise, these are valuable data sources. Finally, we show that GEC systems pre-trained on the noisy data sources can be fine-tuned effectively using small amounts of high quality, human-annotated data.

Research Areas

Natural Language Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Data Strategies for Low-Resource Grammatical Error Correction

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Data Strategies for Low-Resource Grammatical Error Correction

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities