Sara Shatnawi


2025

pdf bib
Commonsense Reasoning in Arab Culture
Abdelrahman Sadallah | Junior Cedric Tonga | Khalid Almubarak | Saeed Almheiri | Farah Atif | Chatrine Qwaider | Karima Kadaoui | Sara Shatnawi | Yaser Alesh | Fajri Koto
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite progress in Arabic large language models, such as Jais and AceGPT, their evaluation on commonsense reasoning has largely relied on machine-translated datasets, which lack cultural depth and may introduce Anglocentric biases. Commonsense reasoning is shaped by geographical and cultural contexts, and existing English datasets fail to capture the diversity of the Arab world. To address this, we introduce , a commonsense reasoning dataset in Modern Standard Arabic (MSA), covering cultures of 13 countries across the Gulf, Levant, North Africa, and the Nile Valley. The dataset was built from scratch by engaging native speakers to write and validate culturally relevant questions for their respective countries. spans 12 daily life domains with 54 fine-grained subtopics, reflecting various aspects of social norms, traditions, and everyday experiences. Zero-shot evaluations show that open-weight language models with up to 32B parameters struggle to comprehend diverse Arab cultures, with performance varying across regions. These findings highlight the need for more culturally aware models and datasets tailored to the Arabic-speaking world.

pdf bib
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Fakhraddin Alwajih | Abdellah El Mekki | Samar Mohamed Magdy | AbdelRahim A. Elmadany | Omer Nacar | El Moatez Billah Nagoudi | Reem Abdel-Salam | Hanin Atwany | Youssef Nafea | Abdulfattah Mohammed Yahya | Rahaf Alhamouri | Hamzah A. Alsayadi | Hiba Zayed | Sara Shatnawi | Serry Sibaee | Yasir Ech-chammakhy | Walid Al-Dhabyani | Marwa Mohamed Ali | Imen Jarraya | Ahmed Oumar El-Shangiti | Aisha Alraeesi | Mohammed Anwar AL-Ghrawi | Abdulrahman S. Al-Batati | Elgizouli Mohamed | Noha Taha Elgindi | Muhammed Saeed | Houdaifa Atou | Issam Ait Yahia | Abdelhak Bouayad | Mohammed Machrouh | Amal Makouar | Dania Alkawi | Mukhtar Mohamed | Safaa Taher Abdelfadil | Amine Ziad Ounnoughene | Anfel Rouabhia | Rwaa Assi | Ahmed Sorkatti | Mohamedou Cheikh Tourad | Anis Koubaa | Ismail Berrada | Mustafa Jarrar | Shady Shehata | Muhammad Abdul-Mageed
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce PALM, a year-long community-driven project covering all 22 Arab countries. The dataset contains instruction–response pairs in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world—each an author of this paper—PALM offers a broad, inclusive perspective. We use PALM to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations: while closed-source LLMs generally perform strongly, they still exhibit flaws, and smaller open-source models face greater challenges. Furthermore, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data are publicly available for reproducibility. More information about PALM is available on our project page: https://github.com/UBC-NLP/palm.

2024

pdf bib
Data Augmentation for Speech-Based Diacritic Restoration
Sara Shatnawi | Sawsan Alqahtani | Shady Shehata | Hanan Aldarmaki
Proceedings of the Second Arabic Natural Language Processing Conference

This paper describes a data augmentation technique for boosting the performance of speech-based diacritic restoration. Our experiments demonstrate the utility of this appraoch, resulting in improved generalization of all models across different test sets. In addition, we describe the first multi-modal diacritic restoration model, utilizing both speech and text as input modalities. This type of model can be used to diacritize speech transcripts. Unlike previous work that relies on an external ASR model, the proposed model is far more compact and efficient. While the multi-modal framework does not surpass the ASR-based model for this task, it offers a promising approach for improving the efficiency of speech-based diacritization, with a potential for improvement using data augmentation and other methods.

pdf bib
Casablanca: Data and Models for Multidialectal Arabic Speech Recognition
Bashar Talafha | Karima Kadaoui | Samar Mohamed Magdy | Mariem Habiboullah | Chafei Mohamed Chafei | Ahmed Oumar El-Shangiti | Hiba Zayed | Mohamedou Cheikh Tourad | Rahaf Alhamouri | Rwaa Assi | Aisha Alraeesi | Hour Mohamed | Fakhraddin Alwajih | Abdelrahman Mohamed | Abdellah El Mekki | El Moatez Billah Nagoudi | Benelhadj Djelloul Mama Saadia | Hamzah A. Alsayadi | Walid Al-Dhabyani | Sara Shatnawi | Yasir Ech-chammakhy | Amal Makouar | Yousra Berrachedi | Mustafa Jarrar | Shady Shehata | Ismail Berrada | Muhammad Abdul-Mageed
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In spite of the recent progress in speech processing, the majority of world languages and dialects remain uncovered. This situation only furthers an already wide technological divide, thereby hindering technological and socioeconomic inclusion. This challenge is largely due to the absence of datasets that can empower diverse speech systems. In this paper, we seek to mitigate this obstacle for a number of Arabic dialects by presenting Casablanca, a large-scale community-driven effort to collect and transcribe a multi-dialectal Arabic dataset. The dataset covers eight dialects: Algerian, Egyptian, Emirati, Jordanian, Mauritanian, Moroccan, Palestinian, and Yemeni, and includes annotations for transcription, gender, dialect, and code-switching. We also develop a number of strong baselines exploiting Casablanca. The project page for Casablanca is accessible at: www.dlnlp.ai/speech/casablanca.

pdf bib
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Fajri Koto | Haonan Li | Sara Shatnawi | Jad Doughman | Abdelrahman Sadallah | Aisha Alraeesi | Khalid Almubarak | Zaid Alyafeai | Neha Sengupta | Shady Shehata | Nizar Habash | Preslav Nakov | Timothy Baldwin
Findings of the Association for Computational Linguistics: ACL 2024

The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models. While state-of-the-art models are partially trained on large Arabic texts, evaluating their performance in Arabic remains challenging due to the limited availability of relevant datasets. To bridge this gap, we present ArabicMMLU, the first multi-task language understanding benchmark for the Arabic language, sourced from school exams across diverse educational levels in different countries spanning North Africa, the Levant, and the Gulf regions. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA) and is carefully constructed by collaborating with native speakers in the region. Our comprehensive evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models. Notably, BLOOMZ, mT0, LLama2, and Falcon struggle to achieve a score of 50%, while even the top-performing Arabic-centric model only achieves a score of 62.3%.

pdf bib
Automatic Restoration of Diacritics for Speech Data Sets
Sara Shatnawi | Sawsan Alqahtani | Hanan Aldarmaki
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Automatic text-based diacritic restoration models generally have high diacritic error rates when applied to speech transcripts as a result of domain and style shifts in spoken language. In this work, we explore the possibility of improving the performance of automatic diacritic restoration when applied to speech data by utilizing parallel spoken utterances. In particular, we use the pre-trained Whisper ASR model fine-tuned on relatively small amounts of diacritized Arabic speech data to produce rough diacritized transcripts for the speech utterances, which we then use as an additional input for diacritic restoration models. The proposed framework consistently improves diacritic restoration performance compared to text-only baselines. Our results highlight the inadequacy of current text-based diacritic restoration models for speech data sets and provide a new baseline for speech-based diacritic restoration.