From Causal Parrots to Causal Prophets? Towards Sound Causal Reasoning with Large Language Models

Rahul Babu Shrestha, Simon Malberg, Georg Groh


Abstract
Causal reasoning is a fundamental property of human and machine intelligence. While large language models (LLMs) excel in many natural language tasks, their ability to infer causal relationships beyond memorized associations is debated. This study systematically evaluates recent LLMs’ causal reasoning across three levels of Pearl’s Ladder of Causation—associational, interventional, and counterfactual—as well as commonsensical, anti-commonsensical, and nonsensical causal structures using the CLadder dataset. We further explore the effectiveness of prompting techniques, including chain of thought (CoT), self-consistency (SC), and causal chain of thought (CausalCoT), in enhancing causal reasoning, and propose two new techniques causal tree of thoughts (CausalToT) and causal program of thoughts (CausalPoT). While larger models tend to outperform smaller ones and are generally more robust against perturbations, our results indicate that all tested LLMs still have difficulties, especially with counterfactual reasoning. However, our CausalToT and CausalPoT significantly improve performance over existing prompting techniques, suggesting that hybrid approaches combining LLMs with formal reasoning frameworks can mitigate these limitations. Our findings contribute to understanding LLMs’ reasoning capacities and outline promising strategies for improving their ability to reason causally as humans would. We release our code and data.
Anthology ID:
2025.nlp4dh-1.29
Volume:
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Month:
May
Year:
2025
Address:
Albuquerque, USA
Editors:
Mika Hämäläinen, Emily Öhman, Yuri Bizzoni, So Miyagawa, Khalid Alnajjar
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
319–333
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.29/
DOI:
Bibkey:
Cite (ACL):
Rahul Babu Shrestha, Simon Malberg, and Georg Groh. 2025. From Causal Parrots to Causal Prophets? Towards Sound Causal Reasoning with Large Language Models. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 319–333, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
From Causal Parrots to Causal Prophets? Towards Sound Causal Reasoning with Large Language Models (Babu Shrestha et al., NLP4DH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.29.pdf