Command R7B Arabic: a small, enterprise-focused, multilingual, and culturally aware Arabic LLM

Yazeed Alnumay, Alexandre Barbet, Anna Bialas, William Darling, Shaan Desai, Joan Devassy, Kyle Duffy, Stephanie Howe, Olivia Lasche, Justin Lee, Anirudh Shrinivason, Jennifer Tracey


Abstract
Building high-quality large language models (LLMs) for enterprise Arabic applications remains challenging due to the limited availability of digitized Arabic data. In this work, we present a data synthesis and refinement strategy to help address this problem, namely, by leveraging synthetic data generation and human-in-the-loop annotation to expand our Arabic training corpus. We further present our iterative post training recipe that is essential to achieving state-of-the-art performance in aligning the model with human preferences, a critical aspect to enterprise use cases. The culmination of this effort is the release of a small, 7B, open-weight model that outperforms similarly sized peers in head-to-head comparisons and on Arabic-focused benchmarks covering cultural knowledge, instruction following, RAG, and contextual faithfulness.
Anthology ID:
2025.africanlp-1.17
Volume:
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Constantine Lignos, Idris Abdulmumin, David Adelani
Venues:
AfricaNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
126–135
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.africanlp-1.17/
DOI:
10.18653/v1/2025.africanlp-1.17
Bibkey:
Cite (ACL):
Yazeed Alnumay, Alexandre Barbet, Anna Bialas, William Darling, Shaan Desai, Joan Devassy, Kyle Duffy, Stephanie Howe, Olivia Lasche, Justin Lee, Anirudh Shrinivason, and Jennifer Tracey. 2025. Command R7B Arabic: a small, enterprise-focused, multilingual, and culturally aware Arabic LLM. In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), pages 126–135, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Command R7B Arabic: a small, enterprise-focused, multilingual, and culturally aware Arabic LLM (Alnumay et al., AfricaNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.africanlp-1.17.pdf