Evaluating Cross Lingual Transfer for Morphological Analysis: a Case Study of Indian Languages

Partha Talukdar; Pushpak Bhattacharyya; Siddhesh Pawar

Evaluating Cross Lingual Transfer for Morphological Analysis: a Case Study of Indian Languages

Partha Talukdar

Pushpak Bhattacharyya

Siddhesh Pawar

Submitting to EMNLP 2022 (2023)

Download Google Scholar

Abstract

Pretrained multilingual models such as mBERT and multilingual T5 (mT5) have been successful at many Natural Language Processing tasks. The shared representations learned by these models facilitate cross lingual transfer in case of low resource settings. In this work, we study the usability of these models for morphology analysis tasks such as root word extraction and morphological feature tagging for Indian langauges. In particular, we use the mT5 model to train gender, number and person tagger for langauges from 2 Indian families. We use data from 6 Indian langauges: Marathi, Hindi, Bengali, Tamil, Telugu and Kannada to fine-tune a multilingual GNP tagger and root word extractor.
We demonstrate the usability of multilingual models for few shot cross-lingual transfer through an average 7\% increase in GNP tagging in case of cross-lingual settings as compared to a monolingual setting and through controlled experiments. We also provide insights into cross-lingual transfer of morphological tags for verbs and nouns; which also provides a proxy for quality of the multilingual representations of word markers learned by the model.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Evaluating Cross Lingual Transfer for Morphological Analysis: a Case Study of Indian Languages

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs