Aya El aatar
Also published as: Aya El Aatar
2026
Alexandria: A Multi-Domain Dialectal Arabic Machine Translation Dataset for Culturally Inclusive and Linguistically Diverse LLMs
Abdellah EL Mekki | Samar M. Magdy | Houdaifa Atou | Ruwa AbuHweidi | Baraah Qawasmeh | Omer Nacar | Thikra Al-hibiri | Razan Saadie | Hamzah A. Alsayadi | Nadia Ghezaiel Hammouda | Alshima Mohammed Alkhazimi | Aya Hamod | Al-Yas Yaqoob Al-Ghafri | Wesam El-Sayed | Asila Ismail al Sharji | Mohamad Ballout | Anas Belfathi | Karim Ghaddar | Serry Sibaee | Alaa Aoun | Aeej Mohammed Aseri | Lina Abureesh | Ahlam Bashiti | Majdal Yousef | Abdulaziz Hafiz | Yehdih Mohamed | Emira Hamedtou | Brakehe Emehah | Rahaf Alhamouri | Youssef Nafea | Aya El Aatar | Walid Al-Dhabyani | Emhemed S. Hamed | Sara Shatnawi | Fakhraddin Alwajih | Khalid Elkhidir | Ashwag Alasmari | Abdurrahman Gerrio | Omar Said Alshahri | AbdelRahim A. Elmadany | Ismail Berrada | Amir Azad Adli Al-kathiri | Fadi Zaraket | Mustafa Jarrar | Yahya Mohamed EL Hadj | Hassan Alhuzali | Muhammad Abdul-Mageed
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Abdellah EL Mekki | Samar M. Magdy | Houdaifa Atou | Ruwa AbuHweidi | Baraah Qawasmeh | Omer Nacar | Thikra Al-hibiri | Razan Saadie | Hamzah A. Alsayadi | Nadia Ghezaiel Hammouda | Alshima Mohammed Alkhazimi | Aya Hamod | Al-Yas Yaqoob Al-Ghafri | Wesam El-Sayed | Asila Ismail al Sharji | Mohamad Ballout | Anas Belfathi | Karim Ghaddar | Serry Sibaee | Alaa Aoun | Aeej Mohammed Aseri | Lina Abureesh | Ahlam Bashiti | Majdal Yousef | Abdulaziz Hafiz | Yehdih Mohamed | Emira Hamedtou | Brakehe Emehah | Rahaf Alhamouri | Youssef Nafea | Aya El Aatar | Walid Al-Dhabyani | Emhemed S. Hamed | Sara Shatnawi | Fakhraddin Alwajih | Khalid Elkhidir | Ashwag Alasmari | Abdurrahman Gerrio | Omar Said Alshahri | AbdelRahim A. Elmadany | Ismail Berrada | Amir Azad Adli Al-kathiri | Fadi Zaraket | Mustafa Jarrar | Yahya Mohamed EL Hadj | Hassan Alhuzali | Muhammad Abdul-Mageed
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Arabic is a highly diglossic language where most daily communication occurs in regional dialects rather than Modern Standard Arabic (MSA). Despite this, machine translation (MT) systems often generalize poorly to dialectal input, limiting their utility for millions of speakers. We introduce Alexandria, a large-scale, community-driven, human-translated dataset designed to bridge this gap. Alexandria covers 13 Arab countries and 11 high-impact domains, including health, education, and agriculture. Unlike previous resources, Alexandria provides unprecedented granularity by associating contributions with city-of-origin metadata, capturing authentic local varieties beyond coarse regional labels. The dataset consists of parallel English-Dialectal Arabic multi-turn conversational scenarios annotated with speaker-addressee gender configurations, enabling the study of gender-conditioned variation in dialectal use. Comprising 107K total turns, Alexandria serves as both a training resource and as a rigorous benchmark for evaluating MT and Large Language Models (LLMs). Our automatic and human evaluation benchmarks the current capabilities of Arabic-aware LLMs in translating across diverse Arabic dialects and sub-dialects while exposing significant persistent challenges.The Alexandria dataset, the creation prompts, the translation and revision guidelines, and the evaluation code are publicly available in the following repository: https://github.com/UBC-NLP/Alexandria
2025
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Fakhraddin Alwajih | Samar M. Magdy | Abdellah El Mekki | Omer Nacar | Youssef Nafea | Safaa Taher Abdelfadil | Abdulfattah Mohammed Yahya | Hamzah Luqman | Nada Almarwani | Samah Aloufi | Baraah Qawasmeh | Houdaifa Atou | Serry Sibaee | Hamzah A. Alsayadi | Walid Al-Dhabyani | Maged S. Al-shaibani | Aya El aatar | Nour Qandos | Rahaf Alhamouri | Samar Ahmad | Mohammed Anwar AL-Ghrawi | Aminetou Yacoub | Ruwa AbuHweidi | Vatimetou Mohamed Lemin | Reem Abdel-Salam | Ahlam Bashiti | Adel Ammar | Aisha Alansari | Ahmed Ashraf | Nora Alturayeif | Alcides Alcoba Inciarte | AbdelRahim A. Elmadany | Mohamedou Cheikh Tourad | Ismail Berrada | Mustafa Jarrar | Shady Shehata | Muhammad Abdul-Mageed
Findings of the Association for Computational Linguistics: EMNLP 2025
Fakhraddin Alwajih | Samar M. Magdy | Abdellah El Mekki | Omer Nacar | Youssef Nafea | Safaa Taher Abdelfadil | Abdulfattah Mohammed Yahya | Hamzah Luqman | Nada Almarwani | Samah Aloufi | Baraah Qawasmeh | Houdaifa Atou | Serry Sibaee | Hamzah A. Alsayadi | Walid Al-Dhabyani | Maged S. Al-shaibani | Aya El aatar | Nour Qandos | Rahaf Alhamouri | Samar Ahmad | Mohammed Anwar AL-Ghrawi | Aminetou Yacoub | Ruwa AbuHweidi | Vatimetou Mohamed Lemin | Reem Abdel-Salam | Ahlam Bashiti | Adel Ammar | Aisha Alansari | Ahmed Ashraf | Nora Alturayeif | Alcides Alcoba Inciarte | AbdelRahim A. Elmadany | Mohamedou Cheikh Tourad | Ismail Berrada | Mustafa Jarrar | Shady Shehata | Muhammad Abdul-Mageed
Findings of the Association for Computational Linguistics: EMNLP 2025
Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models’ cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.
Search
Fix author
Co-authors
- Muhammad Abdul-Mageed 2
- Ruwa AbuHweidi 2
- Walid Al-Dhabyani 2
- Rahaf Alhamouri 2
- Hamzah A. Alsayadi 2
- Fakhraddin Alwajih 2
- Houdaifa Atou 2
- Ahlam Bashiti 2
- Ismail Berrada 2
- Abdellah El Mekki 2
- AbdelRahim A. Elmadany 2
- Mustafa Jarrar 2
- Samar Mohamed Magdy 2
- Omer Nacar 2
- Youssef Nafea 2
- Baraah Qawasmeh 2
- Serry Sibaee 2
- Mohammed Anwar AL-Ghrawi 1
- Reem Abdel-Salam 1
- Safaa Taher Abdelfadil 1
- Lina Abureesh 1
- Samar Ahmad 1
- Al-Yas Yaqoob Al-Ghafri 1
- Thikra Al-hibiri 1
- Amir Azad Adli Al-kathiri 1
- Maged S. Al-shaibani 1
- Aisha Alansari 1
- Ashwag Alasmari 1
- Hassan Alhuzali 1
- Alshima Mohammed Alkhazimi 1
- Nada Almarwani 1
- Samah Aloufi 1
- Omar Said Alshahri 1
- Nora Alturayeif 1
- Adel Ammar 1
- Alaa Aoun 1
- Aeej Mohammed Aseri 1
- Ahmed Ashraf 1
- Mohamad Ballout 1
- Anas Belfathi 1
- Yahya Mohamed EL Hadj 1
- Wesam El-Sayed 1
- Khalid Elkhidir 1
- Brakehe Emehah 1
- Abdurrahman Gerrio 1
- Karim Ghaddar 1
- Abdulaziz Hafiz 1
- Emhemed S. Hamed 1
- Emira Hamedtou 1
- Nadia Ghezaiel Hammouda 1
- Aya Hamod 1
- Alcides Alcoba Inciarte 1
- Vatimetou Mohamed Lemin 1
- Hamzah Luqman 1
- Yehdih Mohamed 1
- Nour Qandos 1
- Razan Saadie 1
- Sara Shatnawi 1
- Shady Shehata 1
- Mohamedou Cheikh Tourad 1
- Aminetou Yacoub 1
- Abdulfattah Mohammed Yahya 1
- Majdal Yousef 1
- Fadi A. Zaraket 1
- Asila Ismail al Sharji 1