@inproceedings{toughrai-etal-2025-modeling,
    title = "Modeling {N}orth {A}frican Dialects from Standard Languages",
    author = {Toughrai, Yassine  and
      Sma{\"i}li, Kamel  and
      Langlois, David},
    editor = "Darwish, Kareem  and
      Ali, Ahmed  and
      Abu Farha, Ibrahim  and
      Touileb, Samia  and
      Zitouni, Imed  and
      Abdelali, Ahmed  and
      Al-Ghamdi, Sharefah  and
      Alkhereyf, Sakhar  and
      Zaghouani, Wajdi  and
      Khalifa, Salam  and
      AlKhamissi, Badr  and
      Almatham, Rawan  and
      Hamed, Injy  and
      Alyafeai, Zaid  and
      Alowisheq, Areeb  and
      Inoue, Go  and
      Mrini, Khalil  and
      Alshammari, Waad",
    booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.30/",
    pages = "375--383",
    ISBN = "979-8-89176-352-4",
    abstract = "Processing North African Arabic dialects presents significant challenges due to high lexical variability, frequent code-switching with French, and the use of both Arabic and Latin scripts. We address this with a phonemebased normalization strategy that maps Arabic and French text into a simplified representation (Arabic rendered in Latin script), reflecting native reading patterns. Using this method, we pretrain BERTbased models on normalized Modern Standard Arabic and French only and evaluate them on Named Entity Recognition (NER) and text classification. Experiments show that normalized standard-language corpora yield competitive performance on North African dialect tasks; in zero-shot NER, Ar{\_}20k surpasses dialectpretrained baselines. Normalization improves vocabulary alignment, indicating that normalized standard corpora can suffice for developing dialect-supportive"
}Markdown (Informal)
[Modeling North African Dialects from Standard Languages](https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-main.30/) (Toughrai et al., ArabicNLP 2025)
ACL