Penn/UMass/CHOP Biocreative II systems

Koby Crammer
Fernando Pereira
Gideon Mann
Kedar Bellare
Andrew McCallum
Steven Carroll
Yang Jin
Peter White
Proceedings of the Second BioCreative Challenge Evaluation Workshop (2007), pp. 119-124

Abstract

Our team participated in the entity tagging and normalization tasks of Biocreative II. For the
entity tagging task, we used a k-best MIRA learning algorithm with lexicons and automatically
derived word clusters. MIRA accommodates different training loss functions, which allowed us to
exploit gene alternatives in training. We also performed a greedy search over feature templates
and the development data, achieving a final F-measure of 86.28%. For the normalization task, we
proposed a new specialized on-line learning algorithm and applied it for filtering out false positives
from a high recall list of candidates. For normalization we received an F-measure of 69.8%.
×