Sharefah Ahmed Al-Ghamdi
2026
ADAB: Arabic Dataset for Automated Politeness Benchmarking - a Large-Scale Resource for Computational Sociopragmatics
Hend Al-Khalifa | Nadia Ghezaiel | Maria Bounnit | Hend Hamed Alhazmi | Noof Abdullah Alfear | Reem Fahad Alqifari | Ameera Masoud Almasoud | Sharefah Ahmed Al-Ghamdi
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Hend Al-Khalifa | Nadia Ghezaiel | Maria Bounnit | Hend Hamed Alhazmi | Noof Abdullah Alfear | Reem Fahad Alqifari | Ameera Masoud Almasoud | Sharefah Ahmed Al-Ghamdi
Proceedings of the Fifteenth Language Resources and Evaluation Conference
The growing importance of culturally-aware natural language processing systems has led to an increasing demand for resources that capture sociopragmatic phenomena across diverse languages. Nevertheless, Arabic-language resources for politeness detection remain severely under-explored, despite the rich and complex politeness expressions deeply embedded in Arabic communication. In this paper, a new annotated Arabic dataset, called ADAB/أدب (Arabic Politeness Dataset), was generated and carefully collected from four diverse online platforms including social media, e-commerce, and customer service domains, encompassing both Modern Standard Arabic (MSA) and multiple dialectal varieties (Gulf, Egyptian, Levantine, and Maghrebi). This dataset has undergone a thorough annotation process guided by Arabic linguistic traditions and contemporary pragmatic theory, resulting in three-way politeness classifications: polite, impolite, and neutral. The generated dataset contains 10,000 samples with detailed linguistic feature annotations across 16 politeness categories, achieving substantial inter-annotator agreement (κ = 0.703). A comprehensive benchmarking of this dataset was conducted utilizing 40 model configurations spanning traditional machine learning (12 models), transformer-based architecture (10 models), and large language models (18 configurations), thereby effectively demonstrating its practical utility and inherent challenges. This generated resource aims to bridge the gap in Arabic sociopragmatic NLP and encourage further research into politeness-aware applications for the Arabic language.
2024
A Novel Approach for Root Selection in the Dependency Parsing
Sharefah Ahmed Al-Ghamdi | Hend Al-Khalifa | Abdulmalik AlSalman
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024
Sharefah Ahmed Al-Ghamdi | Hend Al-Khalifa | Abdulmalik AlSalman
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024
Although syntactic analysis using the sequence labeling method is promising, it can be problematic when the labels sequence does not contain a root label. This can result in errors in the final parse tree when the postprocessing method assumes the first word as the root. In this paper, we present a novel postprocessing method for BERT-based dependency parsing as sequence labeling. Our method leverages the root’s part of speech tag to select a more suitable root for the dependency tree, instead of using the default first token. We conducted experiments on nine dependency treebanks from different languages and domains, and demonstrated that our technique consistently improves the labeled attachment score (LAS) on most of them.