@inproceedings{alharbi-lee-2021-kawarith,
    title = "Kawarith: an {A}rabic {T}witter Corpus for Crisis Events",
    author = "Alharbi, Alaa  and
      Lee, Mark",
    editor = "Habash, Nizar  and
      Bouamor, Houda  and
      Hajj, Hazem  and
      Magdy, Walid  and
      Zaghouani, Wajdi  and
      Bougares, Fethi  and
      Tomeh, Nadi  and
      Abu Farha, Ibrahim  and
      Touileb, Samia",
    booktitle = "Proceedings of the Sixth Arabic Natural Language Processing Workshop",
    month = apr,
    year = "2021",
    address = "Kyiv, Ukraine (Virtual)",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2021.wanlp-1.5/",
    pages = "42--52",
    abstract = "Social media (SM) platforms such as Twitter provide large quantities of real-time data that can be leveraged during mass emergencies. Developing tools to support crisis-affected communities requires available datasets, which often do not exist for low resource languages. This paper introduces Kawarith a multi-dialect Arabic Twitter corpus for crisis events, comprising more than a million Arabic tweets collected during 22 crises that occurred between 2018 and 2020 and involved several types of hazard. Exploration of this content revealed the most discussed topics and information types, and the paper presents a labelled dataset from seven emergency events that serves as a gold standard for several tasks in crisis informatics research. Using annotated data from the same event, a BERT model is fine-tuned to classify tweets into different categories in the multi- label setting. Results show that BERT-based models yield good performance on this task even with small amounts of task-specific training data."
}Markdown (Informal)
[Kawarith: an Arabic Twitter Corpus for Crisis Events](https://preview.aclanthology.org/ingest-emnlp/2021.wanlp-1.5/) (Alharbi & Lee, WANLP 2021)
ACL