Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith

Shatha Altammami, Eric Atwell


Abstract
Transformer-based models showed near-perfect results on several downstream tasks. However, their performance on classical Arabic texts is largely unexplored. To fill this gap, we evaluate monolingual, bilingual, and multilingual state-of-the-art models to detect relatedness between the Quran (Muslim holy book) and the Hadith (Prophet Muhammed teachings), which are complex classical Arabic texts with underlying meanings that require deep human understanding. To do this, we carefully built a dataset of Quran-verse and Hadith-teaching pairs by consulting sources of reputable religious experts. This study presents the methodology of creating the dataset, which we make available on our repository, and discusses the models’ performance that calls for the imminent need to explore avenues for improving the quality of these models to capture the semantics in such complex, low-resource texts.
Anthology ID:
2022.lrec-1.157
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1462–1471
Language:
URL:
https://aclanthology.org/2022.lrec-1.157
DOI:
Bibkey:
Cite (ACL):
Shatha Altammami and Eric Atwell. 2022. Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1462–1471, Marseille, France. European Language Resources Association.
Cite (Informal):
Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith (Altammami & Atwell, LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.lrec-1.157.pdf
Data
SICK