Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

Yurui Chang, Bochuan Cao, Lu Lin


Abstract
While large language models have demonstrated exceptional performance across a wide range of tasks, they remain susceptible to hallucinations – generating plausible yet factually incorrect contents. Existing methods to mitigating such risk often rely on sampling multiple full-length generations, which introduces significant response latency and becomes ineffective when the model consistently produces hallucinated outputs with high confidence. To address these limitations, we introduce Monitoring Decoding (MD), a novel framework that dynamically monitors the generation process and selectively applies in-process interventions, focusing on revising crucial tokens responsible for hallucinations. Instead of waiting until completion of multiple full-length generations, we identify hallucination-prone tokens during generation using a monitor function, and further refine these tokens through a tree-based decoding strategy. This approach ensures an enhanced factual accuracy and coherence in the generated output while maintaining efficiency. Experimental results demonstrate that MD consistently outperforms self-consistency-based approaches in both effectiveness and efficiency, achieving higher factual accuracy while significantly reducing computational overhead.
Anthology ID:
2025.findings-acl.752
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14574–14587
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.752/
DOI:
10.18653/v1/2025.findings-acl.752
Bibkey:
Cite (ACL):
Yurui Chang, Bochuan Cao, and Lu Lin. 2025. Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14574–14587, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation (Chang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.752.pdf