Imane Momayiz
2025
Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect
Guokan Shang
|
Hadi Abdine
|
Yousef Khoubrane
|
Amr Mohamed
|
Yassine Abbahaddou
|
Sofiane Ennadir
|
Imane Momayiz
|
Xuguang Ren
|
Eric Moulines
|
Preslav Nakov
|
Michalis Vazirgiannis
|
Eric Xing
Proceedings of the First Workshop on Language Models for Low-Resource Languages
We introduce Atlas-Chat, the first-ever collection of LLMs specifically developed for dialectal Arabic. Focusing on Moroccan Arabic, also known as Darija, we construct our instruction dataset by consolidating existing Darija language resources, creating novel datasets both manually and synthetically, and translating English instructions with stringent quality control. Atlas-Chat-2B, 9B, and 27B models, fine-tuned on the dataset, exhibit superior ability in following Darija instructions and performing standard NLP tasks. Notably, our models outperform both state-of-the-art and Arabic-specialized LLMs like LLaMa, Jais, and AceGPT, e.g., our 9B model gains a 13% performance boost over a larger 13B model on DarijaMMLU, in our newly introduced evaluation suite for Darija covering both discriminative and generative tasks. Furthermore, we perform an experimental analysis of various fine-tuning strategies and base model choices to determine optimal configurations. All our resources are publicly accessible, and we believe our work offers comprehensive design methodologies of instruction-tuning for low-resource languages, which are often neglected in favor of data-rich languages by contemporary LLMs.
AtlasIA at SemEval-2025 Task 11: FastText-Based Emotion Detection in Moroccan Arabic for Low-Resource Settings
Abdeljalil El Majjodi
|
Imane Momayiz
|
Nouamane Tazi
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This study addresses multi-label emotion classification in Moroccan Arabic. We developeda lightweight computational approach to detect and categorize emotional content in sevendistinct categories: anger, fear, joy, disgust,sadness, surprise, and neutral. Our findings reveal that our efficient, subword-aware modelachieves 46.44% accuracy on the task, demonstrating the viability of lightweight approachesfor emotion recognition in under-resourcedlanguage variants. The model’s performance,while modest, establishes a baseline for emotion detection in Moroccan Arabic, highlighting both the potential and challenges of applying computationally efficient architectures to dialectal Arabic processing. Our analysis revealsparticular strengths in handling morphologicalvariations and out-of-vocabulary words, thoughchallenges persist in managing code-switchingand subtle emotional distinctions. These results offer valuable insights into the trade-offsbetween speed and accuracy in multilingualemotion detection systems, particularly for low-resource languages.
Search
Fix author
Co-authors
- Yassine Abbahaddou 1
- Hadi Abdine 1
- Abdeljalil El Majjodi 1
- Sofiane Ennadir 1
- Yousef Khoubrane 1
- show all...