EventRelBench: A Comprehensive Benchmark for Evaluating Event Relation Understanding in Large Language Models

Jie Gong, Biaoshuai Zheng, Qiwang Hu


Abstract
Understanding event relationships is critical for tasks such as narrative comprehension, information extraction, and reasoning in natural language processing. Despite the remarkable advancements of large language models (LLMs) across diverse NLP tasks, current studies have not systematically evaluated their ability to capture the complex of event relations. To this end, we aim to assess LLMs on event relationship extraction (ERE) by designing the benchmark EventRelBench. EventRelBench comprises 35K diverse event relation questions covering four key categories—coreference, temporal, causal, and supersub relations. These questions are provided at two levels of granularity: document-level and sentence-level. Extensive experiments on different sizes and types of LLMs show that existing LLMs still fall short in accurately extracting and understanding event relationships. To address this gap, we introduce EventRelInst, a 48K instruction fine‐tuning dataset in the event relation extraction domain. Experimental results not only highlight the shortcomings of current general-purpose LLMs in extracting event relationships but also demonstrate the effectiveness of EventRelInst. Both EventRelBench and EventRelBench will be publicly available.
Anthology ID:
2025.findings-emnlp.482
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9084–9099
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.482/
DOI:
10.18653/v1/2025.findings-emnlp.482
Bibkey:
Cite (ACL):
Jie Gong, Biaoshuai Zheng, and Qiwang Hu. 2025. EventRelBench: A Comprehensive Benchmark for Evaluating Event Relation Understanding in Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9084–9099, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
EventRelBench: A Comprehensive Benchmark for Evaluating Event Relation Understanding in Large Language Models (Gong et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.482.pdf
Checklist:
 2025.findings-emnlp.482.checklist.pdf