Exploring the impact of data representation on neural data-to-text generation
David M. Howcroft, Lewis N. Watson, Olesia Nedopas, Dimitra Gkatzia
Abstract
A relatively under-explored area in research on neural natural language generation is the impact of the data representation on text quality. Here we report experiments on two leading input representations for data-to-text generation: attribute-value pairs and Resource Description Framework (RDF) triples. Evaluating the performance of encoder-decoder seq2seq models as well as recent large language models (LLMs) with both automated metrics and human evaluation, we find that the input representation does not seem to have a large impact on the performance of either purpose-built seq2seq models or LLMs. Finally, we present an error analysis of the texts generated by the LLMs and provide some insights into where these models fail.- Anthology ID:
- 2024.inlg-main.20
- Volume:
- Proceedings of the 17th International Natural Language Generation Conference
- Month:
- September
- Year:
- 2024
- Address:
- Tokyo, Japan
- Editors:
- Saad Mahamood, Nguyen Le Minh, Daphne Ippolito
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 243–253
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.inlg-main.20/
- DOI:
- Cite (ACL):
- David M. Howcroft, Lewis N. Watson, Olesia Nedopas, and Dimitra Gkatzia. 2024. Exploring the impact of data representation on neural data-to-text generation. In Proceedings of the 17th International Natural Language Generation Conference, pages 243–253, Tokyo, Japan. Association for Computational Linguistics.
- Cite (Informal):
- Exploring the impact of data representation on neural data-to-text generation (Howcroft et al., INLG 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.inlg-main.20.pdf