Owen Merkling
2010
CCASH: A Web Application Framework for Efficient, Distributed Language Resource Development
Paul Felt
|
Owen Merkling
|
Marc Carmen
|
Eric Ringger
|
Warren Lemmon
|
Kevin Seppi
|
Robbie Haertel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We introduce CCASH (Cost-Conscious Annotation Supervised by Humans), an extensible web application framework for cost-efficient annotation. CCASH provides a framework in which cost-efficient annotation methods such as Active Learning can be explored via user studies and afterwards applied to large annotation projects. CCASHs architecture is described as well as the technologies that it is built on. CCASH allows custom annotation tasks to be built from a growing set of useful annotation widgets. It also allows annotation methods (such as AL) to be implemented in any language. Being a web application framework, CCASH offers secure centralized data and annotation storage and facilitates collaboration among multiple annotations. By default it records timing information about each annotation and provides facilities for recording custom statistics. The CCASH framework has been used to evaluate a novel annotation strategy presented in a concurrently published paper, and will be used in the future to annotate a large Syriac corpus.
Tag Dictionaries Accelerate Manual Annotation
Marc Carmen
|
Paul Felt
|
Robbie Haertel
|
Deryle Lonsdale
|
Peter McClanahan
|
Owen Merkling
|
Eric Ringger
|
Kevin Seppi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Expert human input can contribute in various ways to facilitate automatic annotation of natural language text. For example, a part-of-speech tagger can be trained on labeled input provided offline by experts. In addition, expert input can be solicited by way of active learning to make the most of annotator expertise. However, hiring individuals to perform manual annotation is costly both in terms of money and time. This paper reports on a user study that was performed to determine the degree of effect that a part-of-speech dictionary has on a group of subjects performing the annotation task. The user study was conducted using a modular, web-based interface created specifically for text annotation tasks. The user study found that for both native and non-native English speakers a dictionary with greater than 60% coverage was effective at reducing annotation time and increasing annotator accuracy. On the basis of this study, we predict that using a part-of-speech tag dictionary with coverage greater than 60% can reduce the cost of annotation in terms of both time and money.
Search
Co-authors
- Paul Felt 2
- Marc Carmen 2
- Eric Ringger 2
- Kevin Seppi 2
- Robbie Haertel 2
- show all...
Venues
- lrec2