Abstract
This paper presents a corpus of human answers in natural language collected in order to build a base of examples useful when generating natural language answers. We present the corpus and the way we acquired it. Answers correspond to questions with fixed linguistic form, focus, and topic. Answers to a given question exist for two modalities of interaction: oral and written. The whole corpus of answers was annotated manually and automatically on different levels including words from the questions being reused in the answer, the precise element answering the question (or information-answer), and completions. A detailed description of the annotations is presented. Two examples of corpus analyses are described. The first analysis shows some differences between oral and written modality especially in terms of length of the answers. The second analysis concerns the reuse of the question focus in the answers.- Anthology ID:
- L10-1208
- Volume:
- Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
- Month:
- May
- Year:
- 2010
- Address:
- Valletta, Malta
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/301_Paper.pdf
- DOI:
- Cite (ACL):
- Anne Garcia-Fernandez, Sophie Rosset, and Anne Vilnat. 2010. MACAQ : A Multi Annotated Corpus to Study how we Adapt Answers to Various Questions. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
- Cite (Informal):
- MACAQ : A Multi Annotated Corpus to Study how we Adapt Answers to Various Questions (Garcia-Fernandez et al., LREC 2010)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/301_Paper.pdf