Sadam Al-Azani


Fixing paper assignments

  1. Please select all papers that do not belong to this person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Enhancing Arabic NLP Tasks through Character-Level Models and Data Augmentation
Mohanad Mohamed | Sadam Al-Azani
Proceedings of the 31st International Conference on Computational Linguistics

This study introduces a character-level approach specifically designed for Arabic NLP tasks, offering a novel and highly effective solution to the unique challenges inherent in Arabic language processing. It presents a thorough comparative study of various character-level models, including Convolutional Neural Networks (CNNs), pre-trained transformers (CANINE), and Bidirectional Long Short-Term Memory networks (BiLSTMs), assessing their performance and exploring the impact of different data augmentation techniques on enhancing their effectiveness. Additionally, it introduces two innovative Arabic-specific data augmentation methods—vowel deletion and style transfer—and rigorously evaluates their effectiveness. The proposed approach was evaluated on Arabic privacy policy classification task as a case study, demonstrating significant improvements in model performance, reporting a micro-averaged F1-score of 93.8%, surpassing state-of-the-art models.

pdf bib
OntologyRAG-Q: Resource Development and Benchmarking for Retrieval-Augmented Question Answering in Qur’anic Tafsir
Sadam Al-Azani | Maad Alowaifeer | Alhanoof Alhunief | Ahmed Abdelali
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

This paper introduces essential resources for Qur’anic studies: an annotated Tafsir ontology, a dataset of approximately 4,200 question-answer pairs, and a collection of 15 structured Tafsir books available in two formats. We present a comprehensive framework for handling sensitive Qur’anic Tafsir data that spans the entire pipeline from dataset construction through evaluation and error analysis. Our work establishes new benchmarks for retrieval and question-answering tasks on Qur’anic content, comparing performance across state-of-the-art embedding models and large language models (LLMs).We introduce OntologyRAG-Q, a novel retrieval-augmented generation approach featuring our custom Ayat-Ontology chunking method that segments Tafsir content at the verse level using ontology-driven structure. Benchmarking reveals strong performance across various LLMs, with GPT-4 achieving the highest results, followed closely by ALLaM. Expert evaluations show our system achieves 69.52% accuracy and 74.36% correctness overall, though multi-hop and context-dependent questions remain challenging. Our analysis demonstrates that answer position within documents significantly impacts retrieval performance, and among the evaluation metrics tested, BERT-recall and BERT-F1 correlate most strongly with expert assessments. The resources developed in this study are publicly available at https://github.com/sazani/OntologyRAG-Q.git.