Reliable Use of Lemmas via Eligibility Reasoning and Section-Aware Reinforcement Learning

Zhikun Xu; Xiaodong Yu; Ben Zhou; Jiang Liu; Jialian Wu; Ze Wang; Ximeng Sun; Hao Chen; Zicheng Liu

Reliable Use of Lemmas via Eligibility Reasoning and Section-Aware Reinforcement Learning

Zhikun Xu, Xiaodong Yu, Ben Zhou, Jiang Liu, Jialian Wu, Ze Wang, Ximeng Sun, Hao Chen, Zicheng Liu

Abstract

Recent large language models (LLMs) perform strongly on mathematical benchmarks yet often misapply lemmas, importing conclusions without validating assumptions. We formalize lemma-judging as a structured prediction task: given a statement and a candidate lemma, the model must output a precondition check and a conclusion-utility check, from which a usefulness decision is derived. We present RULES, which encodes this specification via a two-section output and trains with reinforcement learning plus section-aware loss masking to assign penalty to the section responsible for errors. Training and evaluation draw on diverse natural-language and formal proof corpora; robustness is assessed with a held-out perturbation suite; and end-to-end evaluation spans competition-style, perturbation-aligned, and theorem-based problems across various LLMs. Results show consistent in-domain gains over both a vanilla model and a single-label RL baseline, larger improvements on applicability-breaking perturbations, and parity or modest gains on end-to-end tasks; ablations indicate that the two-section outputs and section-aware reinforcement are both necessary for robustness.

Anthology ID:: 2026.acl-short.20
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 216–226
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-short.20/
DOI:
Bibkey:
Cite (ACL):: Zhikun Xu, Xiaodong Yu, Ben Zhou, Jiang Liu, Jialian Wu, Ze Wang, Ximeng Sun, Hao Chen, and Zicheng Liu. 2026. Reliable Use of Lemmas via Eligibility Reasoning and Section-Aware Reinforcement Learning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 216–226, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Reliable Use of Lemmas via Eligibility Reasoning and Section-Aware Reinforcement Learning (Xu et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-short.20.pdf
Checklist:: 2026.acl-short.20.checklist.pdf

PDF Cite Search Checklist Fix data