DisCal: Distribution-Aware Calibration for Mathematical Reasoning Under Character-Level Noisy Inputs

Bo Zhang, Jiawei Zhang, Cong Gao, Bingxu Han, Minghao Hu, Jun Zhang, Yunbo Cao, Zhunchen Luo, Wen Yao, Guotong Geng, Zhong Wang


Abstract
Although large reasoning models (LRMs) exhibit exceptional mathematical reasoning capabilities on clean inputs, their reasoning accuracy drops substantially in the presence of character-level noise such as typographical errors. Critically, their confidence estimates fail to reflect the corresponding decline in reasoning accuracy. While confidence calibration offers a principled solution, existing methods predominantly target clean inputs, leaving noisy scenarios largely unexplored. To address this gap, we propose DisCal (Distribution-aware Calibration), a confidence calibration framework for character-level noisy inputs. DisCal extracts uncertainty signals from both the empirical answer distribution and the model’s predictive distribution, and integrates them via a learned calibrator to produce well-calibrated confidence. Experiments across multiple mathematical reasoning benchmarks demonstrate that DisCal consistently outperforms existing calibration methods under noisy inputs, reducing Expected Calibration Error (ECE) by up to 39.21% and improving Area Under the Receiver Operating Characteristic Curve (AUROC) by up to 31.44%.
Anthology ID:
2026.acl-long.660
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14484–14507
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.660/
DOI:
Bibkey:
Cite (ACL):
Bo Zhang, Jiawei Zhang, Cong Gao, Bingxu Han, Minghao Hu, Jun Zhang, Yunbo Cao, Zhunchen Luo, Wen Yao, Guotong Geng, and Zhong Wang. 2026. DisCal: Distribution-Aware Calibration for Mathematical Reasoning Under Character-Level Noisy Inputs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14484–14507, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
DisCal: Distribution-Aware Calibration for Mathematical Reasoning Under Character-Level Noisy Inputs (Zhang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.660.pdf
Checklist:
 2026.acl-long.660.checklist.pdf