Kareem Mohamed Darwish
2025
BALSAM: A Platform for Benchmarking Arabic Large Language Models
Rawan Nasser Almatham | Kareem Mohamed Darwish | Raghad Al-Rasheed | Waad Thuwaini Alshammari | Muneera Alhoshan | Amal Almazrua | Asma Al Wazrah | Mais Alheraki | Firoj Alam | Preslav Nakov | Norah A. Alzahrani | Eman Albilali | Nizar Habash | Abdelrahman Mustafa El-Sheikh | Muhammad Elmallah | Hamdy Mubarak | Zaid Alyafeai | Mohamed Anwar | Haonan Li | Ahmed Abdelali | Nora Altwairesh | Maram Hasanain | Abdulmohsen Al-Thubaity | Shady Shehata | Bashar Alhafni | Injy Hamed | Go Inoue | Khalid N. Elmadani | Ossama Obeid | Fatima Haouari | Tamer Elsayed | Emad A. Alghamdi | Khalid Almubarak | Saied Alshahrani | Ola Aljareh | Safa Alajlan | Areej Alshaqarawi | Maryam Alshihri | Sultana Alghurabi | Atikah Alzeghayer | Afrah Altamimi | Abdullah Alfaifi | Abdulrahman M Alosaimy
Proceedings of The Third Arabic Natural Language Processing Conference
Rawan Nasser Almatham | Kareem Mohamed Darwish | Raghad Al-Rasheed | Waad Thuwaini Alshammari | Muneera Alhoshan | Amal Almazrua | Asma Al Wazrah | Mais Alheraki | Firoj Alam | Preslav Nakov | Norah A. Alzahrani | Eman Albilali | Nizar Habash | Abdelrahman Mustafa El-Sheikh | Muhammad Elmallah | Hamdy Mubarak | Zaid Alyafeai | Mohamed Anwar | Haonan Li | Ahmed Abdelali | Nora Altwairesh | Maram Hasanain | Abdulmohsen Al-Thubaity | Shady Shehata | Bashar Alhafni | Injy Hamed | Go Inoue | Khalid N. Elmadani | Ossama Obeid | Fatima Haouari | Tamer Elsayed | Emad A. Alghamdi | Khalid Almubarak | Saied Alshahrani | Ola Aljareh | Safa Alajlan | Areej Alshaqarawi | Maryam Alshihri | Sultana Alghurabi | Atikah Alzeghayer | Afrah Altamimi | Abdullah Alfaifi | Abdulrahman M Alosaimy
Proceedings of The Third Arabic Natural Language Processing Conference
The impressive advancement of Large Language Models (LLMs) in English has not been matched across all languages. In particular, LLM performance in Arabic lags behind, due to data scarcity, linguistic diversity of Arabic and its dialects, morphological complexity, etc. Progress is further hindered by the quality of Arabic benchmarks, which typically rely on static, publicly available data, lack comprehensive task coverage, or do not provide dedicated platforms with blind test sets. This makes it challenging to measure actual progress and to mitigate data contamination. Here, we aim to bridge these gaps. In particular, we introduce BALSAM, a comprehensive, community-driven benchmark aimed at advancing Arabic LLM development and evaluation. It includes 78 NLP tasks from 14 broad categories, with 52K examples divided into 37K test and 15K development, and a centralized, transparent platform for blind evaluation. We envision BALSAM as a unifying platform that sets standards and promotes collaborative research to advance Arabic LLM capabilities.
Tool Calling for Arabic LLMs: Data Strategies and Instruction Tuning
Asım Ersoy | Enes Altinisik | Kareem Mohamed Darwish | Husrev Taha Sencar
Proceedings of The Third Arabic Natural Language Processing Conference
Asım Ersoy | Enes Altinisik | Kareem Mohamed Darwish | Husrev Taha Sencar
Proceedings of The Third Arabic Natural Language Processing Conference
Tool calling is a critical capability that allows Large Language Models (LLMs) to interact with external systems, significantly expanding their utility. However, research and resources for tool calling are predominantly English-centric, leaving a gap in our understanding of how to enable this functionality for other languages, such as Arabic. This paper investigates three key research questions: (1) the necessity of in-language (Arabic) tool-calling data versus relying on cross-lingual transfer, (2) the effect of general-purpose instruction tuning on tool-calling performance, and (3) the value of fine-tuning on specific, high-priority tools. To address these questions, we conduct extensive experiments using base and post-trained variants of an open-weight Arabic LLM. To enable this study, we bridge the resource gap by translating and adapting two open-source tool-calling datasets into Arabic. Our findings provide crucial insights into the optimal strategies for developing robust tool-augmented agents for Arabic.
IslamicEval 2025: The First Shared Task of Capturing LLMs Hallucination in Islamic Content
Hamdy Mubarak | Rana Malhas | Watheq Mansour | Abubakr Mohamed | Mahmoud Fawzi | Majd Hawasly | Tamer Elsayed | Kareem Mohamed Darwish | Walid Magdy
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Hamdy Mubarak | Rana Malhas | Watheq Mansour | Abubakr Mohamed | Mahmoud Fawzi | Majd Hawasly | Tamer Elsayed | Kareem Mohamed Darwish | Walid Magdy
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Search
Fix author
Co-authors
- Tamer Elsayed 2
- Hamdy Mubarak 2
- Ahmed Abdelali 1
- Asma Al Wazrah 1
- Raghad Al-Rasheed 1
- Abdulmohsen Al-Thubaity 1
- Safa Alajlan 1
- Firoj Alam 1
- Eman Albilali 1
- Abdullah Alfaifi 1
- Emad A. Alghamdi 1
- Sultana Alghurabi 1
- Bashar Alhafni 1
- Mais Alheraki 1
- Muneera Alhoshan 1
- Ola Aljareh 1
- Rawan Nasser Almatham 1
- Amal Almazrua 1
- Khalid Almubarak 1
- Abdulrahman M Alosaimy 1
- Saied Alshahrani 1
- Waad Thuwaini Alshammari 1
- Areej Alshaqarawi 1
- Maryam Alshihri 1
- Afrah Altamimi 1
- Enes Altinisik 1
- Nora Altwairesh 1
- Zaid Alyafeai 1
- Norah A. Alzahrani 1
- Atikah Alzeghayer 1
- Mohamed Anwar 1
- Abdelrahman Mustafa El-Sheikh 1
- Khalid N. Elmadani 1
- Muhammad Elmallah 1
- Asım Ersoy 1
- Mahmoud Fawzi 1
- Nizar Habash 1
- Injy Hamed 1
- Fatima Haouari 1
- Maram Hasanain 1
- Majd Hawasly 1
- Go Inoue 1
- Haonan Li 1
- Walid Magdy 1
- Rana Malhas 1
- Watheq Mansour 1
- Abubakr Mohamed 1
- Preslav Nakov 1
- Ossama Obeid 1
- Husrev Taha Sencar 1
- Shady Shehata 1