Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification

Yiju Guo; Tianyi Hu; Zexu Sun; Yankai Lin (林衍凯)

Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification

Yiju Guo, Tianyi Hu, Zexu Sun, Yankai Lin

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has advanced LLM reasoning, but remains constrained by inefficient exploration under limited rollout budgets, leading to low sampling success and unstable training in complex tasks. We find that many exploration failures arise not from problem difficulty, but from a small number of prompt tokens that introduce interference. Building on this insight, we propose the Less Noise Sampling Framework (LENS), which first purifies prompts by identifying and removing interference tokens. then transfers successful rollouts from the purified setting to supervise policy optimization on the original noisy prompts, enabling the model to learn to ignore interference in the real-world, noisy prompting settings. Experimental results show that LENS significantly outperforms GRPO, delivering higher performance and faster convergence, with a 3.88% average gain and over 1.6 × speedup. Our work highlights the critical role of pruning interference tokens in improving rollout efficiency, offering a new perspective for RLVR research.

Anthology ID:: 2026.findings-acl.659
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13469–13482
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.659/
DOI:
Bibkey:
Cite (ACL):: Yiju Guo, Tianyi Hu, Zexu Sun, and Yankai Lin. 2026. Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification. In Findings of the Association for Computational Linguistics: ACL 2026, pages 13469–13482, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification (Guo et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.659.pdf
Checklist:: 2026.findings-acl.659.checklist.pdf

PDF Cite Search Checklist Fix data