IDRISI-RA: The First Arabic Location Mention Recognition Dataset of Disaster Tweets

Reem Suwaileh, Muhammad Imran, Tamer Elsayed


Abstract
Extracting geolocation information from social media data enables effective disaster management, as it helps response authorities; for example, in locating incidents for planning rescue activities, and affected people for evacuation. Nevertheless, geolocation extraction is greatly understudied for the low resource languages such as Arabic. To fill this gap, we introduce IDRISI-RA, the first publicly-available Arabic Location Mention Recognition (LMR) dataset that provides human- and automatically-labeled versions in order of thousands and millions of tweets, respectively. It contains both location mentions and their types (e.g., district, city). Our extensive analysis shows the decent geographical, domain, location granularity, temporal, and dialectical coverage of IDRISI-RA. Furthermore, we establish baselines using the standard Arabic NER models and build two simple, yet effective, LMR models. Our rigorous experiments confirm the need for developing specific models for Arabic LMR in the disaster domain. Moreover, experiments show the promising domain and geographical generalizability of IDRISI-RA under zero-shot learning.
Anthology ID:
2023.acl-long.901
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16298–16317
Language:
URL:
https://aclanthology.org/2023.acl-long.901
DOI:
10.18653/v1/2023.acl-long.901
Bibkey:
Cite (ACL):
Reem Suwaileh, Muhammad Imran, and Tamer Elsayed. 2023. IDRISI-RA: The First Arabic Location Mention Recognition Dataset of Disaster Tweets. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16298–16317, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
IDRISI-RA: The First Arabic Location Mention Recognition Dataset of Disaster Tweets (Suwaileh et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2023.acl-long.901.pdf
Video:
 https://preview.aclanthology.org/ingest-2024-clasp/2023.acl-long.901.mp4