Abstract
We describe an under-studied problem in language resource management: that of providing automatic assistance to annotators working in exploratory settings. When no satisfactory tagset already exists, such as in under-resourced or undocumented languages, it must be developed iteratively while annotating data. This process naturally gives rise to a sequence of datasets, each annotated differently. We argue that this problem is best regarded as a transfer learning problem with multiple source tasks. Using part-of-speech tagging data with simulated exploratory tagsets, we demonstrate that even simple transfer learning techniques can significantly improve the quality of pre-annotations in an exploratory annotation.- Anthology ID:
- L14-1168
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 140–145
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/147_Paper.pdf
- DOI:
- Cite (ACL):
- Paul Felt, Eric Ringger, Kevin Seppi, and Kristian Heal. 2014. Using Transfer Learning to Assist Exploratory Corpus Annotation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 140–145, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Using Transfer Learning to Assist Exploratory Corpus Annotation (Felt et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/147_Paper.pdf