Google Research

The Power of Language Music: Arabic Lemmatization through Patterns

Proceedings of the Workshop on Cognitive Aspects of the Lexicon, Osaka, Japan (2016), pp. 40-50


Patterns play a pivotal role in Arabic morphological processing whether related to derivation or inflection. These patterns have not been yet adequately and fully utilized in computational processing of the language. The novel contribution of this paper is performing lemmatization (a high level lexical processing) without relying on a lookup dictionary. We use a machine learning classifier to predict the lemma pattern for a given stem, and use mapping rules to convert stems to their respective lemmas.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work