Abstract
The goal of active learning is to minimise the cost of producing an annotated dataset, in which annotators are assumed to be perfect, i.e., they always choose the correct labels. However, in practice, annotators are not infallible, and they are likely to assign incorrect labels to some instances. Proactive learning is a generalisation of active learning that can model different kinds of annotators. Although proactive learning has been applied to certain labelling tasks, such as text classification, there is little work on its application to named entity (NE) tagging. In this paper, we propose a proactive learning method for producing NE annotated corpora, using two annotators with different levels of expertise, and who charge different amounts based on their levels of experience. To optimise both cost and annotation quality, we also propose a mechanism to present multiple sentences to annotators at each iteration. Experimental results for several corpora show that our method facilitates the construction of high-quality NE labelled datasets at minimal cost.- Anthology ID:
- W17-2314
- Volume:
- BioNLP 2017
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada,
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 117–125
- Language:
- URL:
- https://aclanthology.org/W17-2314
- DOI:
- 10.18653/v1/W17-2314
- Cite (ACL):
- Maolin Li, Nhung Nguyen, and Sophia Ananiadou. 2017. Proactive Learning for Named Entity Recognition. In BioNLP 2017, pages 117–125, Vancouver, Canada,. Association for Computational Linguistics.
- Cite (Informal):
- Proactive Learning for Named Entity Recognition (Li et al., BioNLP 2017)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/W17-2314.pdf
- Data
- GENIA