An in-depth human study of the mathematical reasoning abilities in Large Language Models

Carolina Dias-Alexiou, Edison Marrese-Taylor, Yutaka Matsuo


Abstract
We study the generalization capabilities of large language models (LLM) through the lens of mathematical reasoning, asking if these models can recognize that two structures are the same even when they do not share the same nomenclature. We propose a human study to evaluate if LLMs reproduce proofs that they have most likely seen during training, but when the symbols do not match the ones seen. To test this in a controlled scenario, we look at proofs in propositional calculus, foundational for other logic systems, semantically complete and widely discussed online. We replace the implication operator () with an unrelated, arbitrary symbol () and ask experts to evaluate how the output of a selection of LLMs changes in terms of compliance, correctness, extensiveness and coherence. Our results show that nearly all our tested models produce lower quality proofs in this test, in particular open-weights models, suggesting the abilities of these LLMs to reason in this context have important limitations.
Anthology ID:
2025.mathnlp-main.14
Volume:
Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Marco Valentino, Deborah Ferreira, Mokanarangan Thayaparan, Leonardo Ranaldi, Andre Freitas
Venues:
MathNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
186–194
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.mathnlp-main.14/
DOI:
Bibkey:
Cite (ACL):
Carolina Dias-Alexiou, Edison Marrese-Taylor, and Yutaka Matsuo. 2025. An in-depth human study of the mathematical reasoning abilities in Large Language Models. In Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025), pages 186–194, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
An in-depth human study of the mathematical reasoning abilities in Large Language Models (Dias-Alexiou et al., MathNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.mathnlp-main.14.pdf