Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

Taehun Cha; Donghun Lee

doi:10.18653/v1/2024.findings-emnlp.738

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

Abstract

In this work, we show the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure. By examining 24 models on 6 data sets, we find out that 88-98% of cases return statistically significantly distinguishable generation probability and uncertainty distributions. Using this general phenomenon, we showcase a hallucination-reducing training algorithm. Our algorithm outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.

Anthology ID:: 2024.findings-emnlp.738
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12630–12639
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.738/
DOI:: 10.18653/v1/2024.findings-emnlp.738
Bibkey:
Cite (ACL):: Taehun Cha and Donghun Lee. 2024. Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 12630–12639, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts (Cha & Lee, Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.738.pdf

PDF Cite Search Fix data