Large Language and Reasoning Models are Shallow Disjunctive Reasoners

Irtaza Khalid, Amir Masoud Nourollah, Steven Schockaert


Abstract
Large Language Models (LLMs) have been found to struggle with systematic reasoning. Even on tasks where they appear to perform well, their performance often depends on shortcuts, rather than on genuine reasoning abilities, leading them to collapse on out-of-distribution (OOD) examples. Post-training strategies based on reinforcement learning and chain-of-thought prompting have recently been hailed as a step change. However, little is known about the potential of the resulting “Large Reasoning Models” (LRMs) beyond maths and programming-based problem solving, where genuine OOD problems can be sparse. In this paper, we focus on tasks that require systematic relational composition for qualitative spatial and temporal reasoning. The setting allows fine control over problem difficulty to precisely measure OOD generalization. We find that, zero-shot LRMs generally outperform their LLM counterparts in single-path reasoning tasks but struggle in the multi-path setting. Whilst showing comparatively better results, fine-tuned LLMs are also not capable of multi-path generalization. We also provide evidence for the behavioral interpretation for this, i.e., that LRMs are shallow disjunctive reasoners.
Anthology ID:
2025.acl-long.433
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8843–8869
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.433/
DOI:
Bibkey:
Cite (ACL):
Irtaza Khalid, Amir Masoud Nourollah, and Steven Schockaert. 2025. Large Language and Reasoning Models are Shallow Disjunctive Reasoners. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8843–8869, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Large Language and Reasoning Models are Shallow Disjunctive Reasoners (Khalid et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.433.pdf