@inproceedings{al-sabbagh-girju-2012-yadac,
    title = "{YADAC}: Yet another Dialectal {A}rabic Corpus",
    author = "Al-Sabbagh, Rania  and
      Girju, Roxana",
    editor = "Calzolari, Nicoletta  and
      Choukri, Khalid  and
      Declerck, Thierry  and
      Do{\u{g}}an, Mehmet U{\u{g}}ur  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Eighth International Conference on Language Resources and Evaluation ({LREC}'12)",
    month = may,
    year = "2012",
    address = "Istanbul, Turkey",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://preview.aclanthology.org/iwcs-25-ingestion/L12-1387/",
    pages = "2882--2889",
    abstract = "This paper presents the first phase of building YADAC {\textemdash} a multi-genre Dialectal Arabic (DA) corpus {\textemdash} that is compiled using Web data from microblogs (i.e. Twitter), blogs/forums and online knowledge market services in which both questions and answers are user-generated. In addition to introducing two new genres to the current efforts of building DA corpora (i.e. microblogs and question-answer pairs extracted from online knowledge market services), the paper highlights and tackles several new issues related to building DA corpora that have not been handled in previous studies: function-based Web harvesting and dialect identification, vowel-based spelling variation, linguistic hypercorrection and its effect on spelling variation, unsupervised Part-of-Speech (POS) tagging and base phrase chunking for DA. Although the algorithms for both POS tagging and base-phrase chunking are still under development, the results are promising."
}Markdown (Informal)
[YADAC: Yet another Dialectal Arabic Corpus](https://preview.aclanthology.org/iwcs-25-ingestion/L12-1387/) (Al-Sabbagh & Girju, LREC 2012)
ACL
- Rania Al-Sabbagh and Roxana Girju. 2012. YADAC: Yet another Dialectal Arabic Corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2882–2889, Istanbul, Turkey. European Language Resources Association (ELRA).