Abstract
This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.- Anthology ID:
- W17-5525
- Volume:
- Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
- Month:
- August
- Year:
- 2017
- Address:
- Saarbrücken, Germany
- Editors:
- Kristiina Jokinen, Manfred Stede, David DeVault, Annie Louis
- Venue:
- SIGDIAL
- SIG:
- SIGDIAL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 201–206
- Language:
- URL:
- https://aclanthology.org/W17-5525
- DOI:
- 10.18653/v1/W17-5525
- Cite (ACL):
- Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. 2017. The E2E Dataset: New Challenges For End-to-End Generation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 201–206, Saarbrücken, Germany. Association for Computational Linguistics.
- Cite (Informal):
- The E2E Dataset: New Challenges For End-to-End Generation (Novikova et al., SIGDIAL 2017)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/W17-5525.pdf
- Code
- additional community code
- Data
- E2E