Efficient Adaptation of English Language Models for Morphologically Rich and Underrepresented Languages: The Case of Arabic

Ahmed Samy Eldamaty; Mohamed Maher Zenhom Abdelrahman; Mohamed Mostafa Ibrahim Elbehery; Mariam Ashraf; Radwa Elshawi

Efficient Adaptation of English Language Models for Morphologically Rich and Underrepresented Languages: The Case of Arabic

Ahmed Samy Eldamaty, Mohamed Maher Zenhom Abdelrahman, Mohamed Mostafa Ibrahim Elbehery, Mariam Ashraf, Radwa Elshawi

Abstract

Transformer-based language models have revolutionized NLP, yet their adaptation to morphologically rich and dialectally diverse languages such as Arabic remains non-trivial. We introduce ModernAraBERT, a resource-efficient adaptation of the English-pretrained ModernBERT for Arabic, employing continued pretraining on large Arabic corpora followed by lightweight head-only fine-tuning with a frozen encoder. This strategy retains cross-lingual knowledge while capturing Arabic morphology and orthographic variation, offering a scalable alternative to training monolingual models from scratch. We evaluate ModernAraBERT on three representative Arabic NLP tasks, sentiment analysis, named entity recognition, and extractive question answering, against strong Arabic-specific and multilingual baselines (AraBERTv1, AraBERTv2, MARBERT, mBERT). Across all tasks, ModernAraBERT achieves consistent and often substantial improvements, particularly for sentence and token-level understanding, demonstrating that modern English encoder architectures can be efficiently transferred to Arabic through language-adaptive pretraining. Beyond Arabic, our findings highlight a generalizable paradigm for extending state-of-the-art models to morphologically complex and underrepresented languages with reduced computational overhead.

Anthology ID:: 2026.lrec-main.822
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 10485–10496
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.822/
DOI:
Bibkey:
Cite (ACL):: Ahmed Samy Eldamaty, Mohamed Maher Zenhom Abdelrahman, Mohamed Mostafa Ibrahim Elbehery, Mariam Ashraf, and Radwa Elshawi. 2026. Efficient Adaptation of English Language Models for Morphologically Rich and Underrepresented Languages: The Case of Arabic. International Conference on Language Resources and Evaluation, main:10485–10496.
Cite (Informal):: Efficient Adaptation of English Language Models for Morphologically Rich and Underrepresented Languages: The Case of Arabic (Eldamaty et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.822.pdf

PDF Cite Search Fix data