UniMorph 3.0: Universal Morphology

Christo Kirov
Matteo Grella
Amrit Nidhi
Patrick Xia
Ekaterina Vylomova
Sabrina J. Mielke
Garrett Nicolai
Miikka Silfverberg
Yuval Pinter
Cassandra L. Jacobs
Ryan Cotterell
Mans Hulden
David Yarowsky
LREC (2020)

Abstract

The Universal Morphology (UniMorph) project is a collaborative effort providing morphological information for NLP tasks. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.
×