Evaluating Prompt Relevance in Arabic Automatic Essay Scoring: Insights from Synthetic and Real-World Data
Chatrine Qwaider, Kirill Chirkunov, Bashar Alhafni, Nizar Habash, Ted Briscoe
Abstract
Prompt relevance is a critical yet underexplored dimension in Arabic Automated Essay Scoring (AES). We present the first systematic study of binary prompt-essay relevance classification, supporting both AES scoring and dataset annotation. To address data scarcity, we built a synthetic dataset of on-topic and off-topic pairs and evaluated multiple models, including threshold-based classifiers, SVMs, causal LLMs, and a fine-tuned masked SBERT model. For real-data evaluation, we combined QAES with ZAEBUC, creating off-topic pairs via mismatched prompts. We also tested prompt expansion strategies using AraVec, CAMeL, and GPT-4o. Our fine-tuned SBERT achieved 98% F1 on synthetic data and strong results on QAES+ZAEBUC, outperforming SVMs and threshold-based baselines and offering a resource-efficient alternative to LLMs. This work establishes the first benchmark for Arabic prompt relevance and provides practical strategies for low-resource AES.- Anthology ID:
- 2025.arabicnlp-main.13
- Volume:
- Proceedings of The Third Arabic Natural Language Processing Conference
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
- Venue:
- ArabicNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 162–178
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.arabicnlp-main.13/
- DOI:
- 10.18653/v1/2025.arabicnlp-main.13
- Cite (ACL):
- Chatrine Qwaider, Kirill Chirkunov, Bashar Alhafni, Nizar Habash, and Ted Briscoe. 2025. Evaluating Prompt Relevance in Arabic Automatic Essay Scoring: Insights from Synthetic and Real-World Data. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 162–178, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating Prompt Relevance in Arabic Automatic Essay Scoring: Insights from Synthetic and Real-World Data (Qwaider et al., ArabicNLP 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.arabicnlp-main.13.pdf