Nicholas J Jackson
2025
Towards Statistical Factuality Guarantee for Large Vision-Language Models
Zhuohang Li
|
Chao Yan
|
Nicholas J Jackson
|
Wendi Cui
|
Bo Li
|
Jiaxin Zhang
|
Bradley A. Malin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Advancements in Large Vision-Language Models (LVLMs) have demonstrated impressive performance in image-conditioned text generation; however, hallucinated outputs–text that misaligns with the visual input–pose a major barrier to their use in safety-critical applications. We introduce ConfLVLM, a conformal-prediction-based framework that achieves finite-sample distribution-free statistical guarantees to the factuality of LVLM output. Taking each generated detail as a hypothesis, ConfLVLM statistically tests factuality via efficient heuristic uncertainty measures to filter out unreliable claims. We conduct extensive experiments covering three representative application domains: general scene understanding, medical radiology report generation, and document understanding. Remarkably, ConfLVLM reduces the error rate of claims generated by LLaVa-1.5 for scene descriptions from 87.8% to 10.0% by filtering out erroneous claims with a 95.3% true positive rate. Our results further show that ConfLVLM is highly flexible, and can be applied to any black-box LVLMs paired with any uncertainty measure for any image-conditioned free-form text generation task while providing a rigorous guarantee on controlling hallucination risk.
Search
Fix author
Co-authors
- Wendi Cui 1
- Zhuohang Li 1
- Bo Li 1
- Bradley A. Malin 1
- Chao Yan 1
- show all...