Lahjawi: Arabic Cross-Dialect Translator
Mohamed Motasim Hamed, Muhammad Hreden, Khalil Hennara, Zeina Aldallal, Sara Chrouf, Safwan AlModhayan
Abstract
In this paper, we explore the rich diversity of Arabic dialects by introducing a suite of pioneering models called Lahjawi. The primary model, Lahjawi-D2D, is the first designed for cross-dialect translation among 15 Arabic dialects. Furthermore, we introduce Lahjawi-D2MSA, a model designed to convert any Arabic dialect into Modern Standard Arabic (MSA). Both models are fine-tuned versions of Kuwain-1.5B an in-house built small language model, tailored for Arabic linguistic characteristics. We provide a detailed overview of Lahjawi’s architecture and training methods, along with a comprehensive evaluation of its performance. The results demonstrate Lahjawi’s success in preserving meaning and style, with BLEU scores of 9.62 for dialect-to-MSA and 9.88 for dialect-to- dialect tasks. Additionally, human evaluation reveals an accuracy score of 58% and a fluency score of 78%, underscoring Lahjawi’s robust handling of diverse dialectal nuances. This research sets a foundation for future advancements in Arabic NLP and cross-dialect communication technologies.- Anthology ID:
- 2025.wacl-1.2
- Volume:
- Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Saad Ezzini, Hamza Alami, Ismail Berrada, Abdessamad Benlahbib, Abdelkader El Mahdaouy, Salima Lamsiyah, Hatim Derrouz, Amal Haddad Haddad, Mustafa Jarrar, Mo El-Haj, Ruslan Mitkov, Paul Rayson
- Venues:
- WACL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12–24
- Language:
- URL:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2025.wacl-1.2/
- DOI:
- Cite (ACL):
- Mohamed Motasim Hamed, Muhammad Hreden, Khalil Hennara, Zeina Aldallal, Sara Chrouf, and Safwan AlModhayan. 2025. Lahjawi: Arabic Cross-Dialect Translator. In Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4), pages 12–24, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- Lahjawi: Arabic Cross-Dialect Translator (Hamed et al., WACL 2025)
- PDF:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2025.wacl-1.2.pdf