Semi-Automated Named Entity Annotation

Mark Mandel
Steven Carroll
Peter White
Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics(2007), pp. 53-56

Abstract

We investigate a way to partially automate corpus annotation for named entity recognition, by requiring only binary decisions from an annotator. Our approach is based on a linear sequence model trained using a k-best MIRA learning algorithm. We ask an annotator to decide whether each mention produced by a high recall tagger is a true mention or a false positive. We conclude that our approach can reduce the effort of extending a seed training corpus by up to 58%.

Research Areas