Wafa Abdullah Alrajhi


2022

pdf
Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs
Wafa Abdullah Alrajhi | Hend Al-Khalifa | Abdulmalik AlSalman
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model’s sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA outperformed the other PLMs by achieving the highest overall accuracy.