Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model

Daehui Kim, Deokhyung Kang, Sangwon Ryu, Gary Lee


Abstract
Knowledge Graph-to-Text (G2T) generation involves verbalizing structured knowledge graphs into natural language text. Recent advancements in Pretrained Language Models (PLMs) have improved G2T performance, but their effectiveness relies on datasets with precise graph-text alignment. However, the scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research. To address this issue, we introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method that leverages Large Language Models (LLMs) and Data-QuestEval. Our dataset, which contains 5.85M general-domain graph-text pairs, offers high graph-text consistency without reliance on external ontologies. Experimental results demonstrate that PLM fine-tuned on WikiOFGraph outperforms those trained on other datasets across various evaluation metrics. Our method proves to be a scalable and effective solution for generating high-quality G2T data, significantly advancing the field of G2T generation.
Anthology ID:
2026.surgellm-1.3
Volume:
Proceedings of the First Workshop on Structured Understanding, Retrieval, and Generation in the LLM Era (SURGeLLM 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Vivek Gupta, Kaize Ding, Harsha Kokel, Yue Zhao, Amit Agarwal, Yu Wang, Michael Glass, Yu Zhang, Kavitha Srinivas, Xiusi Chen, Oktie Hassanzadeh, Qi Zhu, Shuaichen Chang, Yuan Luo
Venues:
SURGeLLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–69
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.surgellm-1.3/
DOI:
Bibkey:
Cite (ACL):
Daehui Kim, Deokhyung Kang, Sangwon Ryu, and Gary Lee. 2026. Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model. In Proceedings of the First Workshop on Structured Understanding, Retrieval, and Generation in the LLM Era (SURGeLLM 2026), pages 52–69, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model (Kim et al., SURGeLLM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.surgellm-1.3.pdf