TaKG: A New Dataset for Paragraph-level Table-to-Text Generation Enhanced with Knowledge Graphs
Qianqian Qi, Zhenyun Deng, Yonghua Zhu, Lia Jisoo Lee, Michael Witbrock, Jiamou Liu
Abstract
We introduce TaKG, a new table-to-text generation dataset with the following highlights: (1) TaKG defines a long-text (paragraph-level) generation task as opposed to well-established short-text (sentence-level) generation datasets. (2) TaKG is the first large-scale dataset for this task, containing three application domains and ~750,000 samples. (3) To address the divergence phenomenon, TaKG enhances table input using external knowledge graphs, extracted by a new Wikidata-based method. We then propose a new Transformer-based multimodal sequence-to-sequence architecture for TaKG that integrates two pretrained language models RoBERTa and GPT-2. Our model shows reliable performance on long-text generation across a variety of metrics, and outperforms existing models for short-text generation tasks.- Anthology ID:
- 2022.findings-aacl.17
- Volume:
- Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
- Month:
- November
- Year:
- 2022
- Address:
- Online only
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 176–187
- Language:
- URL:
- https://aclanthology.org/2022.findings-aacl.17
- DOI:
- Cite (ACL):
- Qianqian Qi, Zhenyun Deng, Yonghua Zhu, Lia Jisoo Lee, Michael Witbrock, and Jiamou Liu. 2022. TaKG: A New Dataset for Paragraph-level Table-to-Text Generation Enhanced with Knowledge Graphs. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 176–187, Online only. Association for Computational Linguistics.
- Cite (Informal):
- TaKG: A New Dataset for Paragraph-level Table-to-Text Generation Enhanced with Knowledge Graphs (Qi et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.findings-aacl.17.pdf