Mohamed Gabr


2021

pdf
Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task
Badr AlKhamissi | Mohamed Gabr | Muhammad ElNokrashy | Khaled Essam
Proceedings of the Sixth Arabic Natural Language Processing Workshop

In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province. Our final model is an ensemble of variants built on top of MARBERT that achieves an F1-score of 34.03% for DA at the country-level development set—an improvement of 7.63% from previous work.

2020

pdf
Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization
Badr AlKhamissi | Muhammad ElNokrashy | Mohamed Gabr
Proceedings of the Fifth Arabic Natural Language Processing Workshop

We propose a novel architecture for labelling character sequences that achieves state-of-the-art results on the Tashkeela Arabic diacritization benchmark. The core is a two-level recurrence hierarchy that operates on the word and character levels separately—enabling faster training and inference than comparable traditional models. A cross-level attention module further connects the two and opens the door for network interpretability. The task module is a softmax classifier that enumerates valid combinations of diacritics. This architecture can be extended with a recurrent decoder that optionally accepts priors from partially diacritized text, which improves results. We employ extra tricks such as sentence dropout and majority voting to further boost the final result. Our best model achieves a WER of 5.34%, outperforming the previous state-of-the-art with a 30.56% relative error reduction.