Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference

Geonhee Kim, Marco Valentino, Andre Freitas


Abstract
Recent studies on reasoning in language models (LMs) have sparked a debate on whether they can learn systematic inferential principles or merely exploit superficial patterns in the training data. To understand and uncover the mechanisms adopted for formal reasoning in LMs, this paper presents a mechanistic interpretation of syllogistic inference. Specifically, we present a methodology for circuit discovery aimed at interpreting content-independent and formal reasoning mechanisms. Through two distinct intervention methods, we uncover a sufficient and necessary circuit involving middle-term suppression that elucidates how LMs transfer information to derive valid conclusions from premises. Furthermore, we investigate how belief biases manifest in syllogistic inference, finding evidence of partial contamination from additional attention heads responsible for encoding commonsense and contextualized knowledge. Finally, we explore the generalization of the discovered mechanisms across various syllogistic schemes, model sizes and architectures. The identified circuit is sufficient and necessary for syllogistic schemes on which the models achieve high accuracy ( 60%), with compatible activation patterns across models of different families. Overall, our findings suggest that LMs learn transferable content-independent reasoning mechanisms, but that, at the same time, such mechanisms do not involve generalizable and abstract logical primitives, being susceptible to contamination by the same world knowledge acquired during pre-training.
Anthology ID:
2025.findings-acl.525
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10074–10095
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.525/
DOI:
Bibkey:
Cite (ACL):
Geonhee Kim, Marco Valentino, and Andre Freitas. 2025. Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference. In Findings of the Association for Computational Linguistics: ACL 2025, pages 10074–10095, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference (Kim et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.525.pdf