EpiCaR: Knowing What You Don’t Know Matters for Better Reasoning in LLMs

Jewon Yeom, Jaewon Sok, Seonghyeon Park, Jeongjae Park, Taesup Kim


Abstract
Improving the reasoning abilities of large language models (LLMs) has largely relied on iterative self-training with model-generated data. While effective at boosting accuracy, existing approaches primarily reinforce successful reasoning paths, incurring a substantial calibration cost: models become overconfident and lose the ability to represent uncertainty. This failure has been characterized as a form of model collapse in alignment, where predictive distributions degenerate toward low-variance point estimates.We address this issue by reframing open-ended reasoning training as an epistemic learning problem, in which models must learn not only how to reason, but also when their reasoning should be trusted. We propose epistemically-calibrated reasoning (EpiCaR) as a training objective that jointly optimizes reasoning performance and calibration, and instantiate it within an iterative supervised fine-tuning framework using explicitly extracted meta-cognitive self-evaluation signals. Experiments on Llama-3 and Qwen-3 families demonstrate that our approach achieves Pareto-superiority over standard baselines in both accuracy and calibration, particularly in models with sufficient reasoning capacity (e.g., 3B+). This framework generalizes effectively to OOD mathematical reasoning (GSM8K) and code generation (MBPP). Ultimately, our approach enables a reduction in the overall inference compute budget, matching the K=30 majority-vote performance of STaR with only K=10 confidence-weighted samples, entirely without the multi-model overhead of external verifiers.
Anthology ID:
2026.acl-long.1026
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22414–22443
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1026/
DOI:
Bibkey:
Cite (ACL):
Jewon Yeom, Jaewon Sok, Seonghyeon Park, Jeongjae Park, and Taesup Kim. 2026. EpiCaR: Knowing What You Don’t Know Matters for Better Reasoning in LLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22414–22443, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
EpiCaR: Knowing What You Don’t Know Matters for Better Reasoning in LLMs (Yeom et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1026.pdf
Checklist:
 2026.acl-long.1026.checklist.pdf