Abstract
This paper describes the design, collection and current status of the Hans Christian Andersen (HCA) conversation corpus. The corpus consists of five separate corpora and represents transcription and annotation of some 57 hours of English spoken and deictic gesture user-system interaction recorded mainly with children 2002-2005. The corpora were collected as part of the development and evaluation process of two consecutive research prototypes. The set-up used to collect each corpus is described as well as our use of each corpus in system development. We describe the annotation of each corpus and briefly present various uses we have made of the corpora so far. The HCA corpus was made publicly available at http://www.niceproject.com/data/ in March 2006.- Anthology ID:
- L06-1089
- Volume:
- Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
- Month:
- May
- Year:
- 2006
- Address:
- Genoa, Italy
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/169_pdf.pdf
- DOI:
- Cite (ACL):
- Niels Ole Bernsen, Laila Dybkjær, and Svend Kiilerich. 2006. H. C. Andersen Conversation Corpus. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
- Cite (Informal):
- H. C. Andersen Conversation Corpus (Bernsen et al., LREC 2006)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/169_pdf.pdf