Crossroads, Buildings and Neighborhoods: A Dataset for Fine-grained Location Recognition

Pei Chen, Haotian Xu, Cheng Zhang, Ruihong Huang


Abstract
General domain Named Entity Recognition (NER) datasets like CoNLL-2003 mostly annotate coarse-grained location entities such as a country or a city. But many applications require identifying fine-grained locations from texts and mapping them precisely to geographic sites, e.g., a crossroad, an apartment building, or a grocery store. In this paper, we introduce a new dataset HarveyNER with fine-grained locations annotated in tweets. This dataset presents unique challenges and characterizes many complex and long location mentions in informal descriptions. We built strong baseline models using Curriculum Learning and experimented with different heuristic curricula to better recognize difficult location mentions. Experimental results show that the simple curricula can improve the system’s performance on hard cases and its overall performance, and outperform several other baseline systems. The dataset and the baseline models can be found at https://github.com/brickee/HarveyNER.
Anthology ID:
2022.naacl-main.243
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3329–3339
Language:
URL:
https://aclanthology.org/2022.naacl-main.243
DOI:
10.18653/v1/2022.naacl-main.243
Bibkey:
Cite (ACL):
Pei Chen, Haotian Xu, Cheng Zhang, and Ruihong Huang. 2022. Crossroads, Buildings and Neighborhoods: A Dataset for Fine-grained Location Recognition. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3329–3339, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Crossroads, Buildings and Neighborhoods: A Dataset for Fine-grained Location Recognition (Chen et al., NAACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.naacl-main.243.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2022.naacl-main.243.mp4
Code
 brickee/harveyner
Data
HarveyNEROntoNotes 5.0