Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium
Kazuaki Maeda, Haejoong Lee, Shawn Medero, Julie Medero, Robert Parker, Stephanie Strassel
Abstract
The Linguistic Data Consortium (LDC) creates a variety of linguistic resources - data, annotations, tools, standards and best practices - for many sponsored projects. The programming staff at LDC has created the tools and technical infrastructures to support the data creation efforts for these projects, creating tools and technical infrastructures for all aspects of data creation projects: data scouting, data collection, data selection, annotation, search, data tracking and worklow management. This paper introduces a number of samples of LDC programming staffs work, with particular focus on the recent additions and updates to the suite of software tools developed by LDC. Tools introduced include the GScout Web Data Scouting Tool, LDC Data Selection Toolkit, ACK - Annotation Collection Kit, XTrans Transcription and Speech Annotation Tool, GALE Distillation Toolkit, and the GALE MT Post Editing Workflow Management System.- Anthology ID:
- L08-1465
- Volume:
- Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
- Month:
- May
- Year:
- 2008
- Address:
- Marrakech, Morocco
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/775_paper.pdf
- DOI:
- Cite (ACL):
- Kazuaki Maeda, Haejoong Lee, Shawn Medero, Julie Medero, Robert Parker, and Stephanie Strassel. 2008. Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
- Cite (Informal):
- Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium (Maeda et al., LREC 2008)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/775_paper.pdf