Penn/UMass/CHOP Biocreative II systems
Abstract
Our team participated in the entity tagging and normalization tasks of Biocreative II. For the
entity tagging task, we used a k-best MIRA learning algorithm with lexicons and automatically
derived word clusters. MIRA accommodates different training loss functions, which allowed us to
exploit gene alternatives in training. We also performed a greedy search over feature templates
and the development data, achieving a final F-measure of 86.28%. For the normalization task, we
proposed a new specialized on-line learning algorithm and applied it for filtering out false positives
from a high recall list of candidates. For normalization we received an F-measure of 69.8%.