Abstract
Extracting and disambiguating geolocation information from social media data enables effective disaster management, as it helps response authorities; for example, locating incidents for planning rescue activities and affected people for evacuation. Nevertheless, the dearth of resources and tools hinders the development and evaluation of Location Mention Disambiguation (LMD) models in the disaster management domain. Consequently, the LMD task is greatly understudied, especially for the low resource languages such as Arabic. To fill this gap, we introduce IDRISI-D, the largest to date English and the first Arabic public LMD datasets. Additionally, we introduce a modified hierarchical evaluation framework that offers a lenient and nuanced evaluation of LMD systems. We further benchmark IDRISI-D datasets using representative baselines and show the competitiveness of BERT-based models.- Anthology ID:
- 2023.arabicnlp-1.14
- Volume:
- Proceedings of ArabicNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore (Hybrid)
- Editors:
- Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
- Venues:
- ArabicNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 158–169
- Language:
- URL:
- https://aclanthology.org/2023.arabicnlp-1.14
- DOI:
- 10.18653/v1/2023.arabicnlp-1.14
- Cite (ACL):
- Reem Suwaileh, Tamer Elsayed, and Muhammad Imran. 2023. IDRISI-D: Arabic and English Datasets and Benchmarks for Location Mention Disambiguation over Disaster Microblogs. In Proceedings of ArabicNLP 2023, pages 158–169, Singapore (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- IDRISI-D: Arabic and English Datasets and Benchmarks for Location Mention Disambiguation over Disaster Microblogs (Suwaileh et al., ArabicNLP-WS 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.arabicnlp-1.14.pdf