LoVeC: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation

Caiqi Zhang; Xiaochen Zhu; Chengzu Li; Nigel Collier; Andreas Vlachos

LoVeC: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation

Caiqi Zhang, Xiaochen Zhu, Chengzu Li, Nigel Collier, Andreas Vlachos

Abstract

Hallucination remains a major challenge for the safe and trustworthy deployment of large language models (LLMs) in factual content generation. Prior work has explored confidence estimation as an effective approach to hallucination detection, but often relies on post-hoc self-consistency methods that require computationally expensive sampling. Verbalized confidence offers a more efficient alternative, but existing approaches are largely limited to short-form question answering (QA) tasks and do not generalize well to open-ended generation. In this paper, we propose LoVeC (Long-form Verbalized Confidence), a novel reinforcement learning (RL)–based method that trains LLMs to append an on-the-fly numerical confidence score to each generated statement during long-form generation. The confidence score serves as a direct and interpretable signal of the factuality of generation. We introduce two evaluation settings, free-form tagging and iterative tagging, to assess different verbalized confidence estimation methods. Experiments on three long-form QA datasets show that our RL-trained models achieve better calibration and generalize robustly across domains. Also, our method is highly efficient, being 20 × faster than traditional self-consistency methods while achieving better calibration.

Anthology ID:: 2026.acl-long.1539
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33336–33363
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1539/
DOI:
Bibkey:
Cite (ACL):: Caiqi Zhang, Xiaochen Zhu, Chengzu Li, Nigel Collier, and Andreas Vlachos. 2026. LoVeC: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 33336–33363, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: LoVeC: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation (Zhang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1539.pdf
Checklist:: 2026.acl-long.1539.checklist.pdf

PDF Cite Search Checklist Fix data