Abstract
Named Entity Recognition (NER) is an essential component of many Natural Language Processing pipelines. However, building these language dependent models requires large amounts of annotated data. Crowdsourcing emerged as a scalable solution to collect and enrich data in a more time-efficient manner. To manage these annotations at scale, it is important to predict completion timelines and compute fair pricing for workers in advance. To achieve these goals, we need to know how much effort will be taken to complete each task. In this paper, we investigate which variables influence the time spent on a named entity annotation task by a human. Our results are two-fold: first, the understanding of the effort-impacting factors which we divided into cognitive load and input length; and second, the performance of the prediction itself. On the latter, through model adaptation and feature engineering, we attained a Root Mean Squared Error (RMSE) of 25.68 words per minute with a Nearest Neighbors model.- Anthology ID:
- 2020.lrec-1.37
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 298–306
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.37
- DOI:
- Cite (ACL):
- Inês Gomes, Rui Correia, Jorge Ribeiro, and João Freitas. 2020. Effort Estimation in Named Entity Tagging Tasks. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 298–306, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Effort Estimation in Named Entity Tagging Tasks (Gomes et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2020.lrec-1.37.pdf