GSM-Noise: Exploring and Enhancing Large Language Models’ Reasoning under Noisy Inputs
Zhengxin Zhang, Chengyu Huang, Xufu Liu, Dan Zhao, Jinyan Su, Claire Cardie
Abstract
Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet they often struggle when dealing with complex, ill-formed, or noisy inputs that frequently occur in interactions with real users. LLMs typically lack crucial refining capabilities needed to filter out irrelevant details, restructure key points before reasoning over the text and responding, resulting in suboptimal performance and incorrect answers. From an information theory perspective, this behavior is akin to decoding a high-entropy problem without first reducing its entropy. In this work, we first introduce GSM-Noise, a benchmark featuring grade-school math problems systematically perturbed to reflect real-world input variability. We show that the reasoning ability of open-source models (e.g., LLaMA and Qwen series) can be compromised by noise, while closed-source models are more robust. To improve LLM robustness under noisy conditions, we propose that LLMs first refine inputs — thereby reducing their entropy — before engaging in in-depth analysis. We investigate three approaches to instill this refinement capability: prompt engineering (PE), supervised finetuning (SFT), and reinforcement learning (RL). Experimental results show that input refinement leads to consistent performance gains: 2–12% with PE, 4–13% with SFT, and 3–25% with RL. These results highlight the importance of incorporating an explicit refinement phase to enhance the robustness and reliability of LLM reasoning in real-world scenarios.- Anthology ID:
- 2026.findings-acl.1748
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 35020–35045
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1748/
- DOI:
- Cite (ACL):
- Zhengxin Zhang, Chengyu Huang, Xufu Liu, Dan Zhao, Jinyan Su, and Claire Cardie. 2026. GSM-Noise: Exploring and Enhancing Large Language Models’ Reasoning under Noisy Inputs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35020–35045, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- GSM-Noise: Exploring and Enhancing Large Language Models’ Reasoning under Noisy Inputs (Zhang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1748.pdf