Google Research

A Gold Standard Dependency Treebank for Turkish

Proceedings of The 12th Language Resources and Evaluation Conference, European Language Resources Association" (2020), pp. 5156-5163

Abstract

We introduce TWT; a new treebank for Turkish which consists of web and Wikipedia sentences that are annotated for segmentation, morphology, part-of-speech and dependency relations. To date, it is the largest publicly available human-annotated morpho-syntactic Turkish treebank in terms of the annotated word count. It is also the first large Turkish dependency treebank that has a dedicated Wikipedia section. We present the tagsets and the methodology that are used in annotating the treebank and also the results of the baseline experiments on Turkish dependency parsing with this treebank.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work