Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types

Paul Felt, Eric Ringger, Jordan Boyd-Graber, Kevin Seppi


Abstract
Annotated corpora enable supervised machine learning and data analysis. To reduce the cost of manual annotation, tasks are often assigned to internet workers whose judgments are reconciled by crowdsourcing models. We approach the problem of crowdsourcing using a framework for learning from rich prior knowledge, and we identify a family of crowdsourcing models with the novel ability to combine annotations with differing structures: e.g., document labels and word labels. Annotator judgments are given in the form of the predicted expected value of measurement functions computed over annotations and the data, unifying annotation models. Our model, a specific instance of this framework, compares favorably with previous work. Furthermore, it enables active sample selection, jointly selecting annotator, data item, and annotation structure to reduce annotation effort.
Anthology ID:
C18-1144
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1694–1704
Language:
URL:
https://aclanthology.org/C18-1144
DOI:
Bibkey:
Cite (ACL):
Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi. 2018. Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1694–1704, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types (Felt et al., COLING 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/C18-1144.pdf