XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMs

Zichen Chen; Jianda Chen; Ambuj Singh; Misha Sra

doi:10.18653/v1/2024.emnlp-main.432

XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMs

Zichen Chen, Jianda Chen, Ambuj Singh, Misha Sra

Abstract

Large Language Models (LLMs) have achieved remarkable success in natural language tasks, yet understanding their reasoning processes remains a significant challenge. We address this by introducing XplainLLM, a dataset accompanying an explanation framework designed to enhance LLM transparency and reliability. Our dataset comprises 24,204 instances where each instance interprets the LLM’s reasoning behavior using knowledge graphs (KGs) and graph attention networks (GAT), and includes explanations of LLMs such as the decoder-only Llama-3 and the encoder-only RoBERTa. XplainLLM also features a framework for generating grounded explanations and the debugger-scores for multidimensional quality analysis. Our explanations include why-choose and why-not-choose components, reason-elements, and debugger-scores that collectively illuminate the LLM’s reasoning behavior. Our evaluations demonstrate XplainLLM’s potential to reduce hallucinations and improve grounded explanation generation in LLMs. XplainLLM is a resource for researchers and practitioners to build trust and verify the reliability of LLM outputs. Our code and dataset are publicly available.

Anthology ID:: 2024.emnlp-main.432
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7578–7596
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.432/
DOI:: 10.18653/v1/2024.emnlp-main.432
Bibkey:
Cite (ACL):: Zichen Chen, Jianda Chen, Ambuj Singh, and Misha Sra. 2024. XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 7578–7596, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMs (Chen et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.432.pdf

PDF Cite Search Fix data