CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought

Boxuan Zhang, Ruqi Zhang


Abstract
Large language models (LLMs) excel in many tasks but struggle to accurately quantify uncertainty in their generated responses. This limitation makes it challenging to detect misinformation and ensure reliable decision-making. Existing uncertainty quantification (UQ) methods for LLMs are primarily prompt-wise rather than response-wise, often requiring multiple response samples, which leads to inefficiency. Moreover, LLMs have been shown to be overconfident, particularly when using reasoning steps to derive their answers. In this work, we introduce a novel approach to quantify response-wise uncertainty by integrating LLMs’ inherent reasoning capabilities through Chain-of-Thought (CoT) into the UQ process. Our CoT-UQ framework captures critical information during inference by extracting keywords from each reasoning step and assessing their importance to the final answer. The uncertainty scores of keywords are then aggregated based on their significance to produce a final uncertainty estimate. We conduct extensive experiments based on Llama Family with model sizes varying from 8B to 13B across logical and mathematical reasoning tasks. Experimental results demonstrate that CoT-UQ significantly outperforms existing UQ methods, achieving an average improvement of 5.9% AUROC compared to current UQ methods.
Anthology ID:
2025.findings-acl.1339
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26114–26133
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.1339/
DOI:
10.18653/v1/2025.findings-acl.1339
Bibkey:
Cite (ACL):
Boxuan Zhang and Ruqi Zhang. 2025. CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought. In Findings of the Association for Computational Linguistics: ACL 2025, pages 26114–26133, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought (Zhang & Zhang, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.1339.pdf