Abstract
A generation system can only be as good as the data it is trained on. In this short paper, we propose a methodology for analysing data-to-text corpora used for training Natural Language Generation (NLG) systems. We apply this methodology to three existing benchmarks. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text generators.- Anthology ID:
- W17-3537
- Volume:
- Proceedings of the 10th International Conference on Natural Language Generation
- Month:
- September
- Year:
- 2017
- Address:
- Santiago de Compostela, Spain
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 238–242
- Language:
- URL:
- https://aclanthology.org/W17-3537
- DOI:
- 10.18653/v1/W17-3537
- Cite (ACL):
- Laura Perez-Beltrachini and Claire Gardent. 2017. Analysing Data-To-Text Generation Benchmarks. In Proceedings of the 10th International Conference on Natural Language Generation, pages 238–242, Santiago de Compostela, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Analysing Data-To-Text Generation Benchmarks (Perez-Beltrachini & Gardent, INLG 2017)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/W17-3537.pdf