Predicting the Features of World Atlas of Language Structures from Speech

Alexander Gutkin; Tatiana Merkulova; Martin Jansche

Predicting the Features of World Atlas of Language Structures from Speech

Alexander Gutkin

Tatiana Merkulova

Martin Jansche

Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), International Speech Communication Association (ISCA), 29--31 August, Gurugram, India (2018), pp. 243-247

Download Google Scholar

Abstract

We present a novel task that involves prediction of linguistic typological features from the World Atlas of Language Structures (WALS) from multilingual speech. We frame this task as a multi-label classification involving predicting the set of non-mutually exclusive and extremely sparse multi-valued WALS features. We investigate whether the speech modality has enough signals for an RNN to reliably discriminate between the typological features for languages which are included in the training data as well as languages withheld from the training. We show that the proposed approach can identify typological features with the overall accuracy of 91.6% for the 16 in-domain and 71.1% for 19 held-out languages. In addition, our approach outperforms language identification-based baselines on all the languages. Also, we show that correctly identifying all the typological features for an unseen language is still a distant goal: for 14 languages out of 19 the prediction error is well above 30%.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Predicting the Features of World Atlas of Language Structures from Speech

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs