Semi-Automated Named Entity Annotation

Mark Mandel
Steven Carroll
Peter White
Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics (2007), pp. 53-56

Abstract

We investigate a way to partially automate
corpus annotation for named entity recognition, by requiring only binary decisions from an annotator. Our approach is based on a linear sequence model trained using a k-best
MIRA learning algorithm. We ask an annotator to decide whether each mention produced by a high recall tagger is a true mention or a false positive. We conclude that our
approach can reduce the effort of extending
a seed training corpus by up to 58%.