Google Research

Learning from Relatives: Unified Dialectal Arabic Segmentation

  • Younes Samih
  • Mohamed Eldesouki
  • Mohammed Attia
  • Ahmed Abdelali
  • Hamdy Mubarak
  • Kareem Darwish
  • Laura Kallmeyer
CONLL, Vancouver, Canada (2017)

Abstract

Arabic dialects do not just share a common koine, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work