Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction

Dain Kaplan, Neil Rubens, Simone Teufel, Takenobu Tokunaga


Abstract
Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for focusing annotation efforts when all documents cannot be annotated, but has the limitation that it is carried out in a closed-loop, selecting points that will improve an existing model. For phenomena-driven and exploratory CC, the lack of existing-models and specific task(s) for using it make traditional AL inapplicable. In this paper we propose a novel method for model-free AL utilising characteristics of phenomena for applying AL to select documents for annotation. The method can also supplement traditional closed-loop AL-based CC to extend the utility of the corpus created beyond a single task. We introduce our tool, MOVE, and show its potential with a real world case-study.
Anthology ID:
L16-1697
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4402–4409
Language:
URL:
https://aclanthology.org/L16-1697
DOI:
Bibkey:
Cite (ACL):
Dain Kaplan, Neil Rubens, Simone Teufel, and Takenobu Tokunaga. 2016. Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4402–4409, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction (Kaplan et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/L16-1697.pdf