Safwan AlModhayan


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Lahjawi: Arabic Cross-Dialect Translator
Mohamed Motasim Hamed | Muhammad Hreden | Khalil Hennara | Zeina Aldallal | Sara Chrouf | Safwan AlModhayan
Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)

In this paper, we explore the rich diversity of Arabic dialects by introducing a suite of pioneering models called Lahjawi. The primary model, Lahjawi-D2D, is the first designed for cross-dialect translation among 15 Arabic dialects. Furthermore, we introduce Lahjawi-D2MSA, a model designed to convert any Arabic dialect into Modern Standard Arabic (MSA). Both models are fine-tuned versions of Kuwain-1.5B an in-house built small language model, tailored for Arabic linguistic characteristics. We provide a detailed overview of Lahjawi’s architecture and training methods, along with a comprehensive evaluation of its performance. The results demonstrate Lahjawi’s success in preserving meaning and style, with BLEU scores of 9.62 for dialect-to-MSA and 9.88 for dialect-to- dialect tasks. Additionally, human evaluation reveals an accuracy score of 58% and a fluency score of 78%, underscoring Lahjawi’s robust handling of diverse dialectal nuances. This research sets a foundation for future advancements in Arabic NLP and cross-dialect communication technologies.