Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards

Youliang Yuan; Qiuyang Mang; Jingbang Chen; Hong Wan; Xiaoyuan Liu; Junjielong Xu; Jen-tse Huang; Wenxuan Wang; Wenxiang Jiao; Pinjia He

Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards

Youliang Yuan, Qiuyang Mang, Jingbang Chen, Hong Wan, Xiaoyuan Liu, Junjielong Xu, Jen-tse Huang, Wenxuan Wang, Wenxiang Jiao, Pinjia He

Abstract

In this paper, we observe that current models are susceptible to reward hacking, leading to a substantial overestimation of a model’s reasoning ability. This is evidenced by a high incidence of “false positives”—solutions that reach the correct answer through an unsound process.Through a systematic analysis with human verification, we establish a taxonomy of these failure modes, identifying patterns like Miracle Steps—abrupt jumps to a correct output without a valid preceding derivation. Probing experiments suggest that these Miracle Steps are linked to answer-recall shortcuts, including memorization from pretraining, where the model accesses the correct answer independently of its reasoning chain.To mitigate this systemic issue, we introduce the Rubric Reward Model (RRM), a process-oriented reward function that evaluates the entire reasoning trajectory against problem-specific rubrics.The RRM explicitly penalizes logical flaws and encourages rigorous deduction.When integrated into an RL pipeline, RRM-based training consistently outperforms outcome-only supervision across four math benchmarks.Notably, it boosts Verified Pass@1024 on AIME2024 from 26.7% to 62.6% and reduces the incidence of Miracle Steps by 71%.Our work demonstrates that rewarding the solution process is crucial for building accurate and reliable models.

Anthology ID:: 2026.acl-long.844
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18556–18577
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.844/
DOI:
Bibkey:
Cite (ACL):: Youliang Yuan, Qiuyang Mang, Jingbang Chen, Hong Wan, Xiaoyuan Liu, Junjielong Xu, Jen-tse Huang, Wenxuan Wang, Wenxiang Jiao, and Pinjia He. 2026. Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18556–18577, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards (Yuan et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.844.pdf
Checklist:: 2026.acl-long.844.checklist.pdf

PDF Cite Search Checklist Fix data