Ondrej Linda


2020

pdf
Understanding the Tradeoff between Cost and Quality of Expert Annotations for Keyphrase Extraction
Hung Chau | Saeid Balaneshin | Kai Liu | Ondrej Linda
Proceedings of the 14th Linguistic Annotation Workshop

Generating expert ground truth annotations of documents can be a very expensive process. However, such annotations are essential for training domain-specific keyphrase extraction models, especially when utilizing data-intensive deep learning models in unique domains such as real-estate. Therefore, it is critical to optimize the manual annotation process to maximize the quality of the annotations while minimizing the cost of manual labor. To address this need, we explore multiple annotation strategies including self-review and peer-review as well as various methods of resolving annotator disagreements. We evaluate these annotation strategies with respect to their cost and on the task of learning keyphrase extraction models applied with an experimental dataset in the real-estate domain. The results demonstrate that different annotation strategies should be considered depending on specific metrics such as precision and recall.