ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving

Zain Ul Abedin, Shahzeb Qamar, Lucie Flek, Akbar Karimi


Abstract
While Large Language Models (LLMs) have shown impressive capabilities in math problem-solving tasks, their robustness to noisy inputs is not well-studied. We propose ArithmAttack to examine how robust the LLMs are when they encounter noisy prompts that contain extra noise in the form of punctuation marks. While being easy to implement, ArithmAttack does not cause any information loss since words are not added or deleted from the context. We evaluate the robustness of eight LLMs, including LLama3, Mistral, Mathstral, and DeepSeek on noisy GSM8K and MultiArith datasets. Our experiments suggest that all the studied models show vulnerability to such noise, with more noise leading to poorer performances.
Anthology ID:
2025.llmsec-1.5
Volume:
Proceedings of the The First Workshop on LLM Security (LLMSEC)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editor:
Jekaterina Novikova
Venues:
LLMSEC | WS
SIG:
SIGSEC
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–53
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.llmsec-1.5/
DOI:
Bibkey:
Cite (ACL):
Zain Ul Abedin, Shahzeb Qamar, Lucie Flek, and Akbar Karimi. 2025. ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving. In Proceedings of the The First Workshop on LLM Security (LLMSEC), pages 48–53, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving (Ul Abedin et al., LLMSEC 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.llmsec-1.5.pdf
Supplementarymaterial:
 2025.llmsec-1.5.SupplementaryMaterial.zip
Supplementarymaterial:
 2025.llmsec-1.5.SupplementaryMaterial.txt