Hallucination Mitigation in Natural Language Generation from Large-Scale Open-Domain Knowledge Graphs

Xiao Shi; Zhengyuan Zhu; Zeyu Zhang; Chengkai Li

doi:10.18653/v1/2023.emnlp-main.770

Hallucination Mitigation in Natural Language Generation from Large-Scale Open-Domain Knowledge Graphs

Xiao Shi, Zhengyuan Zhu, Zeyu Zhang, Chengkai Li

Abstract

In generating natural language descriptions for knowledge graph triples, prior works used either small-scale, human-annotated datasets or datasets with limited variety of graph shapes, e.g., those having mostly star graphs. Graph-to-text models trained and evaluated on such datasets are largely not assessed for more realistic large-scale, open-domain settings. We introduce a new dataset, GraphNarrative, to fill this gap. Fine-tuning transformer-based pre-trained language models has achieved state-of-the-art performance among graph-to-text models. However, this method suffers from information hallucination—the generated text may contain fabricated facts not present in input graphs. We propose a novel approach that, given a graph-sentence pair in GraphNarrative, trims the sentence to eliminate portions that are not present in the corresponding graph, by utilizing the sentence’s dependency parse tree. Our experiment results verify this approach using models trained on GraphNarrative and existing datasets. The dataset, source code, and trained models are released at https://github.com/idirlab/graphnarrator.

Anthology ID:: 2023.emnlp-main.770
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12506–12521
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.emnlp-main.770/
DOI:: 10.18653/v1/2023.emnlp-main.770
Bibkey:
Cite (ACL):: Xiao Shi, Zhengyuan Zhu, Zeyu Zhang, and Chengkai Li. 2023. Hallucination Mitigation in Natural Language Generation from Large-Scale Open-Domain Knowledge Graphs. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12506–12521, Singapore. Association for Computational Linguistics.
Cite (Informal):: Hallucination Mitigation in Natural Language Generation from Large-Scale Open-Domain Knowledge Graphs (Shi et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.emnlp-main.770.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.emnlp-main.770.mp4

PDF Cite Search Video Fix data