Semi-Automated Elicitation Corpus Generation
Alison Alvarez, Lori Levin, Robert Frederking, Erik Peterson, Jeff Good
Abstract
In this document we will describe a semi-automated process for creating elicitation corpora. An elicitation corpus is translated by a bilingual consultant in order to produce high quality word aligned sentence pairs. The corpus sentences are automatically generated from detailed feature structures using the GenKit generation program. Feature structures themselves are automatically generated from information that is provided by a linguist using our corpus specification software. This helps us to build small, flexible corpora for testing and development of machine translation systems.- Anthology ID:
- 2005.mtsummit-posters.10
- Volume:
- Proceedings of Machine Translation Summit X: Posters
- Month:
- September 13-15
- Year:
- 2005
- Address:
- Phuket, Thailand
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- 388–395
- Language:
- URL:
- https://aclanthology.org/2005.mtsummit-posters.10
- DOI:
- Cite (ACL):
- Alison Alvarez, Lori Levin, Robert Frederking, Erik Peterson, and Jeff Good. 2005. Semi-Automated Elicitation Corpus Generation. In Proceedings of Machine Translation Summit X: Posters, pages 388–395, Phuket, Thailand.
- Cite (Informal):
- Semi-Automated Elicitation Corpus Generation (Alvarez et al., MTSummit 2005)
- PDF:
- https://preview.aclanthology.org/landing_page/2005.mtsummit-posters.10.pdf