PiCSAR: Probabilistic Confidence Selection and Ranking for Reasoning Chains

Joshua Ong Jun Leang, Zheng Zhao, Aryo Pradipta Gema, Sohee Yang, Wai-Chung Kwan, Xuanli He, Wenda Li, Pasquale Minervini, Eleonora Giunchiglia, Shay B Cohen


Abstract
Best-of-n sampling improves the accuracy of large language models (LLMs) and large reasoning models (LRMs) by generating multiple candidate solutions and selecting the one with the highest reward. The key challenge for reasoning tasks is designing a scoring function that can identify correct reasoning chains without access to ground-truth answers. We propose Probabilistic Confidence Selection and Ranking for Reasoning Chains (PiCSAR): a simple, training-free method that scores each candidate generation using the joint log-likelihood of the reasoning and final answer. This method utilises both the scores of the reasoning path (*reasoning confidence*) and the final answer (*answer confidence*). PiCSAR achieves substantial gains across several benchmarks (+11.7 on AIME2024, +9.81 on AIME2025), outperforming baselines with at least 2x fewer samples in 20 out of 25 comparisons. Our analysis reveals that correct reasoning chains exhibit higher reasoning and answer confidence, justifying the effectiveness of PiCSAR.
Anthology ID:
2026.findings-acl.1577
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31511–31544
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1577/
DOI:
Bibkey:
Cite (ACL):
Joshua Ong Jun Leang, Zheng Zhao, Aryo Pradipta Gema, Sohee Yang, Wai-Chung Kwan, Xuanli He, Wenda Li, Pasquale Minervini, Eleonora Giunchiglia, and Shay B Cohen. 2026. PiCSAR: Probabilistic Confidence Selection and Ranking for Reasoning Chains. In Findings of the Association for Computational Linguistics: ACL 2026, pages 31511–31544, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
PiCSAR: Probabilistic Confidence Selection and Ranking for Reasoning Chains (Leang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1577.pdf
Checklist:
 2026.findings-acl.1577.checklist.pdf