Evaluating Cross Lingual Transfer for Morphological Analysis: a Case Study of Indian Languages

Pushpak Bhattacharyya
Siddhesh Pawar
Submitting to EMNLP 2022(2023)


Pretrained multilingual models such as mBERT and multilingual T5 (mT5) have been successful at many Natural Language Processing tasks. The shared representations learned by these models facilitate cross lingual transfer in case of low resource settings. In this work, we study the usability of these models for morphology analysis tasks such as root word extraction and morphological feature tagging for Indian langauges. In particular, we use the mT5 model to train gender, number and person tagger for langauges from 2 Indian families. We use data from 6 Indian langauges: Marathi, Hindi, Bengali, Tamil, Telugu and Kannada to fine-tune a multilingual GNP tagger and root word extractor. We demonstrate the usability of multilingual models for few shot cross-lingual transfer through an average 7\% increase in GNP tagging in case of cross-lingual settings as compared to a monolingual setting and through controlled experiments. We also provide insights into cross-lingual transfer of morphological tags for verbs and nouns; which also provides a proxy for quality of the multilingual representations of word markers learned by the model.