Mohammad Alsalka


2022

pdf
LK2022 at Qur’an QA 2022: Simple Transformers Model for Finding Answers to Questions from Qur’an
Abdullah Alsaleh | Saud Althabiti | Ibtisam Alshammari | Sarah Alnefaie | Sanaa Alowaidi | Alaa Alsaqer | Eric Atwell | Abdulrahman Altahhan | Mohammad Alsalka
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

Question answering is a specialized area in the field of NLP that aims to extract the answer to a user question from a given text. Most studies in this area focus on the English language, while other languages, such as Arabic, are still in their early stage. Recently, research tend to develop question answering systems for Arabic Islamic texts, which may impose challenges due to Classical Arabic. In this paper, we use Simple Transformers Question Answering model with three Arabic pre-trained language models (AraBERT, CAMeL-BERT, ArabicBERT) for Qur’an Question Answering task using Qur’anic Reading Comprehension Dataset. The model is set to return five answers ranking from the best to worst based on their probability scores according to the task details. Our experiments with development set shows that AraBERT V0.2 model outperformed the other Arabic pre-trained models. Therefore, AraBERT V0.2 was chosen for the the test set and it performed fair results with 0.45 pRR score, 0.16 EM score and 0.42 F1 score.

2021

pdf
Classifying Verses of the Quran using Doc2vec
Menwa Alshammeri | Eric Atwell | Mohammad Alsalka
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

The Quran, as a significant religious text, bears important spiritual and linguistic values. Understanding the text and inferring the underlying meanings entails semantic similarity analysis. We classified the verses of the Quran into 15 pre-defined categories or concepts, based on the Qurany corpus, using Doc2Vec and Logistic Regression. Our classifier scored 70% accuracy, and 60% F1-score using the distributed bag-of-words architecture. We then measured how similar the documents within the same category are to each other semantically and use this information to evaluate our model. We calculated the mean difference and average similarity values for each category to indicate how well our model describes that category.

2020

pdf
Automatic Hadith Segmentation using PPM Compression
Taghreed Tarmom | Eric Atwell | Mohammad Alsalka
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

In this paper we explore the use of Prediction by partial matching (PPM) compression based to segment Hadith into its two main components (Isnad and Matan). The experiments utilized the PPMD variant of the PPM, showing that PPMD is effective in Hadith segmentation. It was also tested on Hadith corpora of different structures. In the first experiment we used the non- authentic Hadith (NAH) corpus for train- ing models and testing, and in the second experiment we used the NAH corpus for training models and the Leeds University and King Saud University (LK) Hadith cor- pus for testing PPMD segmenter. PPMD of order 7 achieved an accuracy of 92.76% and 90.10% in the first and second experiments, respectively.