CEBC: Conformal Evidence-Bounded Control for Low-Hallucination Vision–Language Generation

Ashish Mishra, Tarun Kumar, Arpit Shah, Suparna Bhattacharya, Martin Foltin


Abstract
Hallucinated object mentions remain a persistent failure mode of vision–language models (VLMs) across generation tasks such as image captioning and visual question answering: outputs may be fluent yet include entities not supported by visual evidence. Existing mitigation approaches often reduce hallucinations at the cost of degraded generation quality or require expensive retraining and task-specific supervision. We introduce CEBC, a lightweight, training-free framework for low-hallucination vision–language generation based on conformal evidence-bounded minimal editing. CEBC first produces a strong base output (via greedy decoding or best-of-K sampling), then applies an evidence-bounded editing step that minimally revises or suppresses unsupported object mentions using constraints derived from an external visual detector. Crucially, the evidence threshold is conformally calibrated on a small held-out set via quantiles of detector confidence scores, enabling explicit and controllable hallucination risk at test time.To balance factuality and informativeness, we further introduce a risk-first, quality-aware selection rule that prioritizes evidence-consistent generations while regularizing unnecessary length or lexical drift. Extensive experiments on MS-COCO and GQA for image captioning, and POPE for VQA evaluation across multiple VLMs demonstrate that CEBC consistently reduces hallucination rates(CHAIR_S, CHAIR_I, POPE) while maintaining or improving standard generation quality metrics (CIDEr, BLEU, CLIPScore). CEBC establishes a stronger factuality–quality Pareto frontier without any additional model training or access to paired supervision beyond an off-the-shelf detector.
Anthology ID:
2026.acl-long.2142
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46193–46206
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2142/
DOI:
Bibkey:
Cite (ACL):
Ashish Mishra, Tarun Kumar, Arpit Shah, Suparna Bhattacharya, and Martin Foltin. 2026. CEBC: Conformal Evidence-Bounded Control for Low-Hallucination Vision–Language Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 46193–46206, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CEBC: Conformal Evidence-Bounded Control for Low-Hallucination Vision–Language Generation (Mishra et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2142.pdf
Checklist:
 2026.acl-long.2142.checklist.pdf