CausalLink: An Interactive Evaluation Framework for Causal Reasoning

Jinyue Feng, Frank Rudzicz


Abstract
We present CausalLink, an innovative evaluation framework that interactively assesses thecausal reasoning skill to identify the correct intervention in conversational language models. Each CausalLink test case creates a hypothetical environment in which the language models are instructed to apply interventions to entities whose interactions follow predefined causal relations generated from controllable causal graphs. Our evaluation framework isolates causal capabilities from the confounding effects of world knowledge and semantic cues. We evaluate a series of LLMs in a scenario featuring movements of geometric shapes and discover that models start to exhibit reliable reasoning on two or three variables at the 14-billion-parameter scale. However, the performance of state-of-the-art models such as GPT4o degrades below random chance as the number of variables increases. We identify and analyze several key failure modes.
Anthology ID:
2025.findings-acl.1147
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22313–22326
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1147/
DOI:
Bibkey:
Cite (ACL):
Jinyue Feng and Frank Rudzicz. 2025. CausalLink: An Interactive Evaluation Framework for Causal Reasoning. In Findings of the Association for Computational Linguistics: ACL 2025, pages 22313–22326, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
CausalLink: An Interactive Evaluation Framework for Causal Reasoning (Feng & Rudzicz, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1147.pdf