Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach

Kareem Darwish; Ahmed Abdelali; Hamdy Mubarak; Younes Samih; Mohammed Attia

Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach

Kareem Darwish

Ahmed Abdelali

Hamdy Mubarak

Younes Samih

Mohammed Attia

The 3rd Workshop on Open-Source Arabic Corpora and Processing Tools in the Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), Miyazaki, Japan (2018)

Download Google Scholar

Abstract

Arabic is written as a sequence of consonants and long vowels, with short vowels normally omitted. Diacritization attempts to recover short vowels and is an essential step for Text-to-Speech (TTS) systems. Though Automatic diacritization of Modern Standard Arabic (MSA) has received significant attention, limited research has been conducted on dialectal Arabic (DA) diacritization. Phonemic patterns of DA vary greatly from MSA and even from one another, which accounts for the noted difficulty with mutual intelligibility between dialects. With the recent advent of spoken dialog systems (or intelligent personal assistants), dialect vowel restoration is crucial to allow systems to speak back to the users in their own language variant. In this paper we present our research and benchmark results on the automatic diacritization of Tunisian and Moroccan using linear Conditional Random Fields.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs