MDC-Bench: A Multidisciplinary Causal Benchmark Based on Causal Structures for Evaluating Large Language Models

Peng Wang; Yuxiong Yan; Xiao Ding; Kai Xiong; Bibo Cai; Chao Peng; Yutai Hou; Dandan Tu; Bing Qin (秦兵); Ting Liu

MDC-Bench: A Multidisciplinary Causal Benchmark Based on Causal Structures for Evaluating Large Language Models

Peng Wang, Yuxiong Yan, Xiao Ding, Kai Xiong, Bibo Cai, Chao Peng, Yutai Hou, Dandan Tu, Bing Qin, Ting Liu

Abstract

Existing causal datasets primarily focus on the commonsense domain, where the questions mainly involve simple, single-hop direct causal relationships. When models possess the corresponding knowledge, even if they cannot understand the causal relationships, they can directly arrive at the correct answers through knowledge matching. However, LLMs often perform poorly when answering questions with complex causal structures and domain-specific expertise. To address the above challenges, we propose MDC-Bench, a multidisciplinary causal evaluation benchmark. MDC-Bench adopts a three-level causal framework consisting of 4 core causal tasks, while its sample content covers 7 representative disciplines and diverse causal structures. In view of the limited coverage of multidisciplinary knowledge during the pre-training phase, the model cannot answer questions relying on knowledge matching. The diverse causal structures force the models to grasp the internal causal logic. We also increase the task complexity through methods such as compound causal operations, aiming to enhance the discriminability among models. MDC-Bench achieves the improvement in terms of domain specialization, structural diversity, and task complexity. Through extensive evaluation, we observe that even the advanced models have substantial room for improvement. MDC-Bench not only establishes a standardized baseline for causal research but also provides valuable insights for the applying LLMs in multiple domains.

Anthology ID:: 2026.findings-acl.1409
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28254–28297
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1409/
DOI:
Bibkey:
Cite (ACL):: Peng Wang, Yuxiong Yan, Xiao Ding, Kai Xiong, Bibo Cai, Chao Peng, Yutai Hou, Dandan Tu, Bing Qin, and Ting Liu. 2026. MDC-Bench: A Multidisciplinary Causal Benchmark Based on Causal Structures for Evaluating Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 28254–28297, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MDC-Bench: A Multidisciplinary Causal Benchmark Based on Causal Structures for Evaluating Large Language Models (Wang et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1409.pdf
Checklist:: 2026.findings-acl.1409.checklist.pdf

PDF Cite Search Checklist Fix data