@inproceedings{bezancon-etal-2025-lost,
    title = "Lost in Variation: An Unsupervised Methodology for Mining Lexico-syntactic Patterns in Middle {A}rabic Texts",
    author = {Bezan{\c{c}}on, Julien  and
      Karam, Rimane  and
      Lejeune, Ga{\"e}l},
    editor = "Ezzini, Saad  and
      Alami, Hamza  and
      Berrada, Ismail  and
      Benlahbib, Abdessamad  and
      El Mahdaouy, Abdelkader  and
      Lamsiyah, Salima  and
      Derrouz, Hatim  and
      Haddad Haddad, Amal  and
      Jarrar, Mustafa  and
      El-Haj, Mo  and
      Mitkov, Ruslan  and
      Rayson, Paul",
    booktitle = "Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.wacl-1.3/",
    pages = "25--37",
    abstract = "While MSA and some dialects of Arabic have been extensively studied in NLP, Middle Arabic is still very much unknown to the field. However, Middle Arabic holds issues that are still not covered: it is characterized by variation since it mixes standard features, colloquial ones, as well as features that belong to neither of the two. Here, we introduce a methodology to identify, extract and rank variations of 13 manually retrieved formulas. Those formulas come from the nine first booklets of S ̄IRAT AL-MALIK AL-Z. ̄AHIR BAYBAR S., a corpus of Damascene popular literature written in Middle Arabic and composed of 53,843 sentences. In total, we ranked 20, sequences according to their similarity with the original formulas on multiple linguistic layers. We noticed that the variations in these formulas occur in a lexical, morphological and graphical level, but in opposition, the semantic and syntactic levels remain strictly invariable."
}Markdown (Informal)
[Lost in Variation: An Unsupervised Methodology for Mining Lexico-syntactic Patterns in Middle Arabic Texts](https://preview.aclanthology.org/ingest-emnlp/2025.wacl-1.3/) (Bezançon et al., WACL 2025)
ACL