82 Treebanks, 34 Models: Universal Dependency Parsing with Cross-Treebank Models

Aaron Smith; Bernd Bohnet; Joakim Nivre; Miryam de Lhoneux; Sara Stymne; Yan Shao

82 Treebanks, 34 Models: Universal Dependency Parsing with Cross-Treebank Models

Aaron Smith

Bernd Bohnet

Joakim Nivre

Miryam de Lhoneux

Sara Stymne

Yan Shao

Conference on Computational Natural Language Learning (2018)

Download Google Scholar

Abstract

We present the Uppsala system for the CoNLL 2018 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. Our system is a pipeline consisting of three components: the first performs joint word and sentence segmentation; the second predicts part-of speech tags and morphological features; the third predicts dependency trees from words and tags. Instead of training a single parsing model for each treebank, we trained models with multiple treebanks for the same language or closely related languages, greatly reducing the number of models. On the official test run, we achieved a macro-averaged LAS F1 of 72.37 and a macro-averaged MLAS F1 of 59.20, ranking 7th of 27 teams for both of these metrics.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

82 Treebanks, 34 Models: Universal Dependency Parsing with Cross-Treebank Models

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs