Modeling North African Dialects from Standard Languages

Yassine Toughrai, Kamel Smaïli, David Langlois


Abstract
Processing North African Arabic dialects presents significant challenges due to high lexical variability, frequent code-switching with French, and the use of both Arabic and Latin scripts. We address this with a phonemebased normalization strategy that maps Arabic and French text into a simplified representation (Arabic rendered in Latin script), reflecting native reading patterns. Using this method, we pretrain BERTbased models on normalized Modern Standard Arabic and French only and evaluate them on Named Entity Recognition (NER) and text classification. Experiments show that normalized standard-language corpora yield competitive performance on North African dialect tasks; in zero-shot NER, Ar_20k surpasses dialectpretrained baselines. Normalization improves vocabulary alignment, indicating that normalized standard corpora can suffice for developing dialect-supportive
Anthology ID:
2025.arabicnlp-main.30
Volume:
Proceedings of The Third Arabic Natural Language Processing Conference
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
Venue:
ArabicNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
375–383
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.30/
DOI:
Bibkey:
Cite (ACL):
Yassine Toughrai, Kamel Smaïli, and David Langlois. 2025. Modeling North African Dialects from Standard Languages. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 375–383, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Modeling North African Dialects from Standard Languages (Toughrai et al., ArabicNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.30.pdf