Mahmoud Reda

2025

pdf bib
ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer
Omer Nacar | Mahmoud Reda | Serry Sibaee | Yasser Alhabashi | Adel Ammar | Wadii Boulila
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

2024

pdf bib abs
Arabic Diacritization Using Morphologically Informed Character-Level Model
Muhammad Morsy Elmallah | Mahmoud Reda | Kareem Darwish | Abdelrahman El-Sheikh | Ashraf Hatim Elneima | Murtadha Aljubran | Nouf Alsaeed | Reem Mohammed | Mohamed Al-Badrashiny
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Arabic diacritic recovery i.e. diacritization is necessary for proper vocalization and an enabler for downstream applications such as language learning and text to speech. Diacritics come in two varieties, namely: core-word diacritics and case endings. In this paper we introduce a highly effective morphologically informed character-level model that can recover both types of diacritics simultaneously. The model uses a Recurrent Neural Network (RNN) based architecture that takes in text as a sequence of characters, with markers for morphological segmentation, and outputs a sequence of diacritics. We also introduce a character-based morphological segmentation model that we train for Modern Standard Arabic (MSA) and dialectal Arabic. We demonstrate the efficacy of our diacritization model on Classical Arabic, MSA, and two dialectal (Moroccan and Tunisian) texts. We achieve the lowest reported word-level diacritization error rate for MSA (3.4%), match the best results for Classical Arabic (5.4%), and report competitive results for dialectal Arabic.

Co-authors

Mohamed Al-Badrashiny 1

Murtadha Aljubran 1

Nouf Alsaeed 1

Kareem Darwish 1

Abdelrahman El-Sheikh 1

Muhammad Morsy Elmallah 1

Ashraf Hatim Elneima 1

Reem Mohammed 1

Venues

Fix author