This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
ShathaAltammami
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
This paper introduces the Hadith Simplification Dataset, a novel resource comprising 250 pairs of Classical Arabic (CA) Hadith texts and their simplified Modern Standard Arabic (MSA) equivalents. Addressing the lack of resources for simplifying culturally and religiously significant texts, this dataset bridges linguistic and accessibility gaps while preserving theological integrity. The simplifications were generated using a large language model and rigorously verified by an Islamic Studies expert to ensure precision and cultural sensitivity. By tackling the unique lexical, syntactic, and cultural challenges of CA-to-MSA transformation, this resource advances Arabic text simplification research. Beyond religious texts, the methodology developed is adaptable to other domains, such as poetry and historical literature. This work underscores the importance of ethical AI applications in preserving the integrity of religious texts while enhancing their accessibility to modern audiences.
Transformer-based models showed near-perfect results on several downstream tasks. However, their performance on classical Arabic texts is largely unexplored. To fill this gap, we evaluate monolingual, bilingual, and multilingual state-of-the-art models to detect relatedness between the Quran (Muslim holy book) and the Hadith (Prophet Muhammed teachings), which are complex classical Arabic texts with underlying meanings that require deep human understanding. To do this, we carefully built a dataset of Quran-verse and Hadith-teaching pairs by consulting sources of reputable religious experts. This study presents the methodology of creating the dataset, which we make available on our repository, and discusses the models’ performance that calls for the imminent need to explore avenues for improving the quality of these models to capture the semantics in such complex, low-resource texts.
This article describes the process of gathering and constructing a bilingual parallel corpus of Islamic Hadith, which is the set of narratives reporting different aspects of the prophet Muhammad’s life. The corpus data is gathered from the six canonical Hadith collections using a custom segmentation tool that automatically segments and annotates the two Hadith components with 92% accuracy. This Hadith segmenter minimises the costs of language resource creation and produces consistent results independently from previous knowledge and experiences that usually influence human annotators. The corpus includes more than 10M tokens and will be freely available via the LREC repository.