Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification

Paul He, Yinya Huang, Mrinmaya Sachan, Zhijing Jin


Abstract
Large language models (LLMs) are increasingly applied to tasks involving causal reasoning. However, current benchmarks often rely on string matching or surface-level metrics that fail to assess whether a model’s output is formally valid under causal semantics. We propose DoVerifier, a symbolic verification framework that checks whether LLM-generated causal expressions are derivable from a given causal graph using rules from do-calculus and probability theory. This allows us to recover correct answers that would otherwise be marked incorrect due to superficial differences. Evaluations on synthetic data and causal QA benchmarks show that DoVerifier more accurately captures semantic correctness than standard metrics, offering a more rigorous and informative way to evaluate LLMs on causal tasks.
Anthology ID:
2026.eacl-long.56
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1231–1250
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.56/
DOI:
Bibkey:
Cite (ACL):
Paul He, Yinya Huang, Mrinmaya Sachan, and Zhijing Jin. 2026. Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1231–1250, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification (He et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.56.pdf