Reem Suwaileh


2021

pdf bib
ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection
Fatima Haouari | Maram Hasanain | Reem Suwaileh | Tamer Elsayed
Proceedings of the Sixth Arabic Natural Language Processing Workshop

In this paper we introduce ArCOV19-Rumors, an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020. We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims. Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic. ArCOV19-Rumors supports two levels of misinformation detection over Twitter: verifying free-text claims (called claim-level verification) and verifying claims expressed in tweets (called tweet-level verification). Our dataset covers, in addition to health, claims related to other topical categories that were influenced by COVID-19, namely, social, politics, sports, entertainment, and religious. Moreover, we present benchmarking results for tweet-level verification on the dataset. We experimented with SOTA models of versatile approaches that either exploit content, user profiles features, temporal features and propagation structure of the conversational threads for tweet verification.

pdf bib
ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks
Fatima Haouari | Maram Hasanain | Reem Suwaileh | Tamer Elsayed
Proceedings of the Sixth Arabic Natural Language Processing Workshop

In this paper, we present ArCOV-19, an Arabic COVID-19 Twitter dataset that spans one year, covering the period from 27th of January 2020 till 31st of January 2021. ArCOV-19 is the first publicly-available Arabic Twitter dataset covering COVID-19 pandemic that includes about 2.7M tweets alongside the propagation networks of the most-popular subset of them (i.e., most-retweeted and -liked). The propagation networks include both retweetsand conversational threads (i.e., threads of replies). ArCOV-19 is designed to enable research under several domains including natural language processing, information retrieval, and social computing. Preliminary analysis shows that ArCOV-19 captures rising discussions associated with the first reported cases of the disease as they appeared in the Arab world.In addition to the source tweets and the propagation networks, we also release the search queries and the language-independent crawler used to collect the tweets to encourage the curation of similar datasets.

2020

pdf bib
Are We Ready for this Disaster? Towards Location Mention Recognition from Crisis Tweets
Reem Suwaileh | Muhammad Imran | Tamer Elsayed | Hassan Sajjad
Proceedings of the 28th International Conference on Computational Linguistics

The widespread usage of Twitter during emergencies has provided a new opportunity and timely resource to crisis responders for various disaster management tasks. Geolocation information of pertinent tweets is crucial for gaining situational awareness and delivering aid. However, the majority of tweets do not come with geoinformation. In this work, we focus on the task of location mention recognition from crisis-related tweets. Specifically, we investigate the influence of different types of labeled training data on the performance of a BERT-based classification model. We explore several training settings such as combing in- and out-domain data from news articles and general-purpose and crisis-related tweets. Furthermore, we investigate the effect of geospatial proximity while training on near or far-away events from the target event. Using five different datasets, our extensive experiments provide answers to several critical research questions that are useful for the research community to foster research in this important direction. For example, results show that, for training a location mention recognition model, Twitter-based data is preferred over general-purpose data; and crisis-related data is preferred over general-purpose Twitter data. Furthermore, training on data from geographically-nearby disaster events to the target event boosts the performance compared to training on distant events.

2018

pdf bib
DART: A Large Dataset of Dialectal Arabic Tweets
Israa Alsarsour | Esraa Mohamed | Reem Suwaileh | Tamer Elsayed
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)