Jiarun Fu

2026

Hallucinations arise when large language models (LLMs) guess rather than acknowledge their underlying uncertainty. Existing static strategies for mitigating hallucinations have been only partially successful, largely because they do not explicitly model the information gain from interacting with the external environment. Researchers need a general method to proactively steer users toward informative clarifications, thereby unlocking the model’s effective capacity under underspecified inputs. We model the uncertainty of LLMs in interactive settings and uncover the mechanism of active calibration between model concepts and human evaluations, improving reliability. We show that calibration error in LLMs density estimation admits a non-vanishing lower bound under non-interactive learning, while interaction empirically reduces it. We further characterize that calibration error identifies informative queries and that calibration can be accelerated by shifting query distributions from imbalanced to balanced regimes. Guided by these insights, we propose a calibration-driven Interactive Learning Strategy (ILS) that selects clarification queries by optimizing calibration error, providing both theoretical guarantees and empirical gains for reliability. Code and data are available at https://github.com/zhouyeah215/Demystifying_Uncertainty.

2025

pdf bib abs

Large language models (LLMs) exhibit remarkable text-generation capabilities, yet struggle with factual consistency, motivating growing interest in factuality verification. Existing factuality verification methods typically follow a Decompose-Then-Verify paradigm, which improves granularity but suffers from poor scalability and efficiency. We propose a novel Decompose-Embed-Interact paradigm that shifts factuality verification from costly text-level reasoning to efficient alignment in embedding space, effectively mitigating the scalability bottlenecks and computational inefficiencies inherent to prior approaches. While the proposed paradigm promises scalable verification, its implementation faces three practical challenges: efficient decomposition, factually faithful embedding, and accurate verification in embedding space. To address these challenges, we introduce E-Verify, a lightweight framework that resolves them through three specially designed modules, each aligned with a specific stage of the paradigm and designed to preserve scalability and efficiency. Experiments demonstrate that E-Verify significantly improves both decomposition and verification efficiency while maintaining competitive accuracy. These results confirm that the proposed paradigm enables scalable and fine-grained factuality verification with minimal performance trade-offs.

Co-authors

Hao Li 1

Ye Yuan 1

Venues

ACL1
Findings1

Fix author