Abstract
Annotated corpora enable supervised machine learning and data analysis. To reduce the cost of manual annotation, tasks are often assigned to internet workers whose judgments are reconciled by crowdsourcing models. We approach the problem of crowdsourcing using a framework for learning from rich prior knowledge, and we identify a family of crowdsourcing models with the novel ability to combine annotations with differing structures: e.g., document labels and word labels. Annotator judgments are given in the form of the predicted expected value of measurement functions computed over annotations and the data, unifying annotation models. Our model, a specific instance of this framework, compares favorably with previous work. Furthermore, it enables active sample selection, jointly selecting annotator, data item, and annotation structure to reduce annotation effort.- Anthology ID:
- C18-1144
- Volume:
- Proceedings of the 27th International Conference on Computational Linguistics
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Emily M. Bender, Leon Derczynski, Pierre Isabelle
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1694–1704
- Language:
- URL:
- https://aclanthology.org/C18-1144
- DOI:
- Cite (ACL):
- Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi. 2018. Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1694–1704, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types (Felt et al., COLING 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/C18-1144.pdf