@inproceedings{dias-alexiou-etal-2025-depth,
    title = "An in-depth human study of the mathematical reasoning abilities in Large Language Models",
    author = "Dias-Alexiou, Carolina  and
      Marrese-Taylor, Edison  and
      Matsuo, Yutaka",
    editor = "Valentino, Marco  and
      Ferreira, Deborah  and
      Thayaparan, Mokanarangan  and
      Ranaldi, Leonardo  and
      Freitas, Andre",
    booktitle = "Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.mathnlp-main.14/",
    pages = "186--194",
    ISBN = "979-8-89176-348-7",
    abstract = "We study the generalization capabilities of large language models (LLM) through the lens of mathematical reasoning, asking if these models can recognize that two structures are the same even when they do not share the same nomenclature. We propose a human study to evaluate if LLMs reproduce proofs that they have most likely seen during training, but when the symbols do not match the ones seen. To test this in a controlled scenario, we look at proofs in \textit{propositional calculus}, foundational for other logic systems, semantically complete and widely discussed online. We replace the implication operator ($\rightarrow$) with an unrelated, arbitrary symbol ($\spadesuit$) and ask experts to evaluate how the output of a selection of LLMs changes in terms of compliance, correctness, extensiveness and coherence. Our results show that nearly all our tested models produce lower quality proofs in this test, in particular open-weights models, suggesting the abilities of these LLMs to reason in this context have important limitations."
}Markdown (Informal)
[An in-depth human study of the mathematical reasoning abilities in Large Language Models](https://preview.aclanthology.org/ingest-emnlp/2025.mathnlp-main.14/) (Dias-Alexiou et al., MathNLP 2025)
ACL