HalluMeasure: Fine-grained Hallucination Measurement Using Chain-of-Thought Reasoning

Shayan Ali Akbar, Md Mosharaf Hossain, Tess Wood, Si-Chi Chin, Erica M Salinas, Victor Alvarez, Erwin Cornejo


Abstract
Automating the measurement of hallucinations in LLM generated responses is a challenging task as it requires careful investigation of each factual claim in a response. In this paper, we introduce HalluMeasure, a new LLM-based hallucination detection mechanism that decomposes an LLM response into atomic claims, and evaluates each atomic claim against the provided reference context. The model uses a step-by-step reasoning process called Chain-of-Thought and can identify 3 major categories of hallucinations (e.g., contradiction) as well as 10 more specific subtypes (e.g., overgeneralization) which help to identify reasons behind the hallucination errors. Specifically, we explore four different configurations for HalluMeasure’s classifier: with and without CoT prompting, and using a single classifier call to classify all claims versus separate calls for each claim. The best-performing configuration (with CoT and separate calls for each claim) demonstrates significant improvements in detecting hallucinations, achieving a 10-point increase in F1 score on our TechNewsSumm dataset, and a 3-point increase in AUC ROC on the SummEval dataset, compared to three baseline models (RefChecker, AlignScore, and Vectara HHEM). We further show reasonable accuracy on detecting 10 novel error subtypes of hallucinations (where even humans struggle in classification) derived from linguistic analysis of the errors made by the LLMs.
Anthology ID:
2024.emnlp-main.837
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15020–15037
Language:
URL:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-main.837/
DOI:
10.18653/v1/2024.emnlp-main.837
Bibkey:
Cite (ACL):
Shayan Ali Akbar, Md Mosharaf Hossain, Tess Wood, Si-Chi Chin, Erica M Salinas, Victor Alvarez, and Erwin Cornejo. 2024. HalluMeasure: Fine-grained Hallucination Measurement Using Chain-of-Thought Reasoning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15020–15037, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
HalluMeasure: Fine-grained Hallucination Measurement Using Chain-of-Thought Reasoning (Akbar et al., EMNLP 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-main.837.pdf