CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Frederic Bechet, Cindy Aloui, Delphine Charlet, Geraldine Damnati, Johannes Heinecke, Alexis Nasr, Frederic Herledan
Abstract
Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper.- Anthology ID:
- D19-5803
- Volume:
- Proceedings of the 2nd Workshop on Machine Reading for Question Answering
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen
- Venue:
- WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19–26
- Language:
- URL:
- https://aclanthology.org/D19-5803
- DOI:
- 10.18653/v1/D19-5803
- Cite (ACL):
- Frederic Bechet, Cindy Aloui, Delphine Charlet, Geraldine Damnati, Johannes Heinecke, Alexis Nasr, and Frederic Herledan. 2019. CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pages 19–26, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations (Bechet et al., 2019)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/D19-5803.pdf