2025
pdf
bib
abs
Arabic-Centric Large Language Models for Dialectal Arabic Sentiment Analysis Task
Salwa Saad Alahmari
|
Eric Atwell
|
Hadeel Saadany
|
Mohammad Alsalka
Proceedings of the Shared Task on Sentiment Analysis for Arabic Dialects
This paper presents a study on sentiment anal- ysis of Dialectal Arabic (DA), with a particu- lar focus on Saudi and Moroccan (Darija) di- alects within the hospitality domain. We in- troduce a novel dataset comprising 698 Saudi Arabian proverbs annotated with sentiment polarity labels—Positive, Negative, and Neu- tral—collected from five major Saudi dialect regions: Najdi, Hijazi, Shamali, Janoubi, and Sharqawi. In addition to this, we used customer reviews for fine-tuning the CAMeLBERT-DA- SA model, which achieved a 75% F1 score in sentiment classification. To further evaluate the robustness of Arabic-centric models, we assessed the performance of three open-source large language models—Allam, ACeGPT, and Jais—in a zero-shot setting using the Ahasis shared task test set. Our results highlight the effectiveness of domain-specific fine-tuning in improving sentiment analysis performance and demonstrate the potential of Arabic-centric LLMs in zero-shot scenarios. This work con- tributes new linguistic resources and empirical insights to support ongoing research in senti- ment analysis for Arabic dialect
2023
pdf
bib
abs
LKAU23 at Qur’an QA 2023: Using Transformer Models for Retrieving Passages and Finding Answers to Questions from the Qur’an
Sarah Alnefaie
|
Abdullah Alsaleh
|
Eric Atwell
|
Mohammad Alsalka
|
Abdulrahman Altahhan
Proceedings of ArabicNLP 2023
The Qur’an QA 2023 shared task has two sub tasks: Passage Retrieval (PR) task and Machine Reading Comprehension (MRC) task. Our participation in the PR task was to further train several Arabic pre-trained models using a Sentence-Transformers architecture and to ensemble the best performing models. The results of the test set did not reflect the results of the development set. CL-AraBERT achieved the best results, with a 0.124 MAP. We also participate in the MRC task by further fine-tuning the base and large variants of AraBERT using Classical Arabic and Modern Standard Arabic datasets. Base AraBERT achieved the best result with the development set with a partial average precision (pAP) of 0.49, while it achieved 0.5 with the test set. In addition, we applied the ensemble approach of best performing models and post-processing steps to the final results. Our experiments with the development set showed that our proposed model achieved a 0.537 pAP. On the test set, our system obtained a pAP score of 0.49.
2022
pdf
bib
abs
LK2022 at Qur’an QA 2022: Simple Transformers Model for Finding Answers to Questions from Qur’an
Abdullah Alsaleh
|
Saud Althabiti
|
Ibtisam Alshammari
|
Sarah Alnefaie
|
Sanaa Alowaidi
|
Alaa Alsaqer
|
Eric Atwell
|
Abdulrahman Altahhan
|
Mohammad Alsalka
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection
Question answering is a specialized area in the field of NLP that aims to extract the answer to a user question from a given text. Most studies in this area focus on the English language, while other languages, such as Arabic, are still in their early stage. Recently, research tend to develop question answering systems for Arabic Islamic texts, which may impose challenges due to Classical Arabic. In this paper, we use Simple Transformers Question Answering model with three Arabic pre-trained language models (AraBERT, CAMeL-BERT, ArabicBERT) for Qur’an Question Answering task using Qur’anic Reading Comprehension Dataset. The model is set to return five answers ranking from the best to worst based on their probability scores according to the task details. Our experiments with development set shows that AraBERT V0.2 model outperformed the other Arabic pre-trained models. Therefore, AraBERT V0.2 was chosen for the the test set and it performed fair results with 0.45 pRR score, 0.16 EM score and 0.42 F1 score.
2021
pdf
bib
abs
Classifying Verses of the Quran using Doc2vec
Menwa Alshammeri
|
Eric Atwell
|
Mohammad Alsalka
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
The Quran, as a significant religious text, bears important spiritual and linguistic values. Understanding the text and inferring the underlying meanings entails semantic similarity analysis. We classified the verses of the Quran into 15 pre-defined categories or concepts, based on the Qurany corpus, using Doc2Vec and Logistic Regression. Our classifier scored 70% accuracy, and 60% F1-score using the distributed bag-of-words architecture. We then measured how similar the documents within the same category are to each other semantically and use this information to evaluate our model. We calculated the mean difference and average similarity values for each category to indicate how well our model describes that category.
2020
pdf
bib
abs
Automatic Hadith Segmentation using PPM Compression
Taghreed Tarmom
|
Eric Atwell
|
Mohammad Alsalka
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
In this paper we explore the use of Prediction by partial matching (PPM) compression based to segment Hadith into its two main components (Isnad and Matan). The experiments utilized the PPMD variant of the PPM, showing that PPMD is effective in Hadith segmentation. It was also tested on Hadith corpora of different structures. In the first experiment we used the non- authentic Hadith (NAH) corpus for train- ing models and testing, and in the second experiment we used the NAH corpus for training models and the Leeds University and King Saud University (LK) Hadith cor- pus for testing PPMD segmenter. PPMD of order 7 achieved an accuracy of 92.76% and 90.10% in the first and second experiments, respectively.