Abstract
Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03. This paper studies disagreements in expert-annotated named entity datasets for three varieties: English, Danish, and DialectX. We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions. We survey student annotations on a subset of difficult entities and substantiate the feasibility and necessity of manifold annotations for understanding named entity ambiguities from a distributional perspective.- Anthology ID:
- 2024.unimplicit-1.7
- Volume:
- Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language
- Month:
- March
- Year:
- 2024
- Address:
- Malta
- Editors:
- Valentina Pyatkin, Daniel Fried, Elias Stengel-Eskin, Alisa Liu, Sandro Pezzelle
- Venues:
- unimplicit | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 73–81
- Language:
- URL:
- https://aclanthology.org/2024.unimplicit-1.7
- DOI:
- Cite (ACL):
- Siyao Peng, Zihang Sun, Sebastian Loftus, and Barbara Plank. 2024. Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations. In Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language, pages 73–81, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations (Peng et al., unimplicit-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.unimplicit-1.7.pdf