Biaoshuai Zheng


2025

pdf bib
EventRelBench: A Comprehensive Benchmark for Evaluating Event Relation Understanding in Large Language Models
Jie Gong | Biaoshuai Zheng | Qiwang Hu
Findings of the Association for Computational Linguistics: EMNLP 2025

Understanding event relationships is critical for tasks such as narrative comprehension, information extraction, and reasoning in natural language processing. Despite the remarkable advancements of large language models (LLMs) across diverse NLP tasks, current studies have not systematically evaluated their ability to capture the complex of event relations. To this end, we aim to assess LLMs on event relationship extraction (ERE) by designing the benchmark EventRelBench. EventRelBench comprises 35K diverse event relation questions covering four key categories—coreference, temporal, causal, and supersub relations. These questions are provided at two levels of granularity: document-level and sentence-level. Extensive experiments on different sizes and types of LLMs show that existing LLMs still fall short in accurately extracting and understanding event relationships. To address this gap, we introduce EventRelInst, a 48K instruction fine‐tuning dataset in the event relation extraction domain. Experimental results not only highlight the shortcomings of current general-purpose LLMs in extracting event relationships but also demonstrate the effectiveness of EventRelInst. Both EventRelBench and EventRelBench will be publicly available.