Enhancing State-of-the-Art NLP Models for Classical Arabic

Tariq Yousef, Lisa Mischer, Hamid Reza Hakimi, Maxim Romanov


Abstract
Classical Arabic, like all other historical languages, lacks adequate training datasets and accurate “off-the-shelf” models that can be directly employed in the processing pipelines. In this paper, we present our in-progress work in developing and training deep learning models tailored for handling diverse tasks relevant to classical Arabic texts. Specifically, we focus on Named Entities Recognition, person relationships classification, toponym sub-classification, onomastic section boundaries detection, onomastic entities classification, as well as date recognition and classification. Our work aims to address the challenges associated with these tasks and provide effective solutions for analyzing classical Arabic texts. Although this work is still in progress, the preliminary results reported in the paper indicate excellent to satisfactory performance of the fine-tuned models, effectively meeting the intended goal for which they were trained.
Anthology ID:
2023.alp-1.19
Volume:
Proceedings of the Ancient Language Processing Workshop
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti
Venues:
ALP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
160–169
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.alp-1.19/
DOI:
Bibkey:
Cite (ACL):
Tariq Yousef, Lisa Mischer, Hamid Reza Hakimi, and Maxim Romanov. 2023. Enhancing State-of-the-Art NLP Models for Classical Arabic. In Proceedings of the Ancient Language Processing Workshop, pages 160–169, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Enhancing State-of-the-Art NLP Models for Classical Arabic (Yousef et al., ALP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.alp-1.19.pdf