3LM: Bridging Arabic, STEM, and Code through Benchmarking

Basma El Amel Boussaha, Leen Al Qadi, Mugariya Farooq, Shaikha Alsuwaidi, Giulia Campesan, Ahmed Alzubaidi, Mohammed Alyafeai, Hakim Hacid


Abstract
Arabic is one of the most widely spoken languages in the world, yet efforts to develop and evaluate Large Language Models (LLMs) for Arabic remain relatively limited. Most existing Arabic benchmarks focus on linguistic, cultural, or religious content, leaving a significant gap in areas like STEM and coding domains that are increasingly relevant for real-world LLM applications. To help bridge this gap, we present 3LM, a suite of three benchmarks designed specifically for Arabic. The first is a set of STEM-related question-answer pairs, naturally sourced from Arabic textbooks and educational worksheets. The second consists of synthetically generated STEM questions, created using the same sources. The third benchmark focuses on code generation, built through a careful translation of two widely used code benchmarks, incorporating a human-in-the-loop process with several rounds of review to ensure high-quality and faithful translations. We release all three benchmarks publicly to support the growth of Arabic LLM research in these essential but underrepresented areas.
Anthology ID:
2025.arabicnlp-main.4
Volume:
Proceedings of The Third Arabic Natural Language Processing Conference
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
Venue:
ArabicNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–63
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.4/
DOI:
Bibkey:
Cite (ACL):
Basma El Amel Boussaha, Leen Al Qadi, Mugariya Farooq, Shaikha Alsuwaidi, Giulia Campesan, Ahmed Alzubaidi, Mohammed Alyafeai, and Hakim Hacid. 2025. 3LM: Bridging Arabic, STEM, and Code through Benchmarking. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 42–63, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
3LM: Bridging Arabic, STEM, and Code through Benchmarking (Boussaha et al., ArabicNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.4.pdf