GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

Zhijing Jin; Qipeng Guo; Xipeng Qiu; Zheng Zhang

doi:10.18653/v1/2020.coling-main.217

GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

Zhijing Jin, Qipeng Guo, Xipeng Qiu, Zheng Zhang

Abstract

Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.

Anthology ID:: 2020.coling-main.217
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 2398–2409
Language:
URL:: https://aclanthology.org/2020.coling-main.217
DOI:: 10.18653/v1/2020.coling-main.217
Bibkey:
Cite (ACL):: Zhijing Jin, Qipeng Guo, Xipeng Qiu, and Zheng Zhang. 2020. GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2398–2409, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation (Jin et al., COLING 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2020.coling-main.217.pdf
Data: GenWiki, E2E, RoboCup, WikiBio

PDF Cite Search