Decomposing Unitization and Typing for Efficient and Consistent Span-Bound Concept Annotation

Nupoor Gandhi, Michael Bada, Emma Strubell


Abstract
In specialized domains that require expert annotators and high inter-annotator agreement, high-quality datasets with span-bound semantic concept annotations remain expensive to develop. Substantial resources are typically spent on unitizing, the task of identifying precise span boundaries for entity mentions. Unitizing is a significant source of inter-annotator disagreement, a poor use of expensive domain expertise, and very time-consuming. We propose a lighter annotation procedure that concentrates manual efforts on typed position annotations, marking positions in the text that overlap with mentions of each entity type, abstracting away span boundary decisions. With as few as 100-200 example sentences, we train span boundary detection models to unitize typed position annotations. Through evaluation over three datasets: CRAFT (biomedical), GENIA (molecular biology), and POLIANNA (climate/energy policy text), we demonstrate that (1) annotating typed positions in the text instead of full concept annotation is a more efficient use of time in low-resource settings, and (2) model-inferred span boundaries result in higher agreement at both the annotator training and corpus annotation phases, without sacrificing utility.
Anthology ID:
2026.findings-acl.1728
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34616–34631
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1728/
DOI:
Bibkey:
Cite (ACL):
Nupoor Gandhi, Michael Bada, and Emma Strubell. 2026. Decomposing Unitization and Typing for Efficient and Consistent Span-Bound Concept Annotation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 34616–34631, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Decomposing Unitization and Typing for Efficient and Consistent Span-Bound Concept Annotation (Gandhi et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1728.pdf
Checklist:
 2026.findings-acl.1728.checklist.pdf