Semi-Automated Named Entity Annotation
Abstract
We investigate a way to partially automate
corpus annotation for named entity recognition, by requiring only binary decisions from an annotator. Our approach is based on a linear sequence model trained using a k-best
MIRA learning algorithm. We ask an annotator to decide whether each mention produced by a high recall tagger is a true mention or a false positive. We conclude that our
approach can reduce the effort of extending
a seed training corpus by up to 58%.