Understanding the Tradeoff between Cost and Quality of Expert Annotations for Keyphrase Extraction

Hung Chau, Saeid Balaneshin, Kai Liu, Ondrej Linda


Abstract
Generating expert ground truth annotations of documents can be a very expensive process. However, such annotations are essential for training domain-specific keyphrase extraction models, especially when utilizing data-intensive deep learning models in unique domains such as real-estate. Therefore, it is critical to optimize the manual annotation process to maximize the quality of the annotations while minimizing the cost of manual labor. To address this need, we explore multiple annotation strategies including self-review and peer-review as well as various methods of resolving annotator disagreements. We evaluate these annotation strategies with respect to their cost and on the task of learning keyphrase extraction models applied with an experimental dataset in the real-estate domain. The results demonstrate that different annotation strategies should be considered depending on specific metrics such as precision and recall.
Anthology ID:
2020.law-1.7
Volume:
Proceedings of the 14th Linguistic Annotation Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain
Venue:
LAW
SIG:
SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–86
Language:
URL:
https://aclanthology.org/2020.law-1.7
DOI:
Bibkey:
Cite (ACL):
Hung Chau, Saeid Balaneshin, Kai Liu, and Ondrej Linda. 2020. Understanding the Tradeoff between Cost and Quality of Expert Annotations for Keyphrase Extraction. In Proceedings of the 14th Linguistic Annotation Workshop, pages 74–86, Barcelona, Spain. Association for Computational Linguistics.
Cite (Informal):
Understanding the Tradeoff between Cost and Quality of Expert Annotations for Keyphrase Extraction (Chau et al., LAW 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.law-1.7.pdf