2025
pdf
bib
abs
ETRQA: A Comprehensive Benchmark for Evaluating Event Temporal Reasoning Abilities of Large Language Models
Sigang Luo
|
Yinan Liu
|
Dongying Lin
|
Yingying Zhai
|
Bin Wang
|
Xiaochun Yang
|
Junpeng Liu
Findings of the Association for Computational Linguistics: ACL 2025
Event temporal reasoning (ETR) aims to model and reason about the relationships between events and time, as well as between events in the real world. Proficiency in ETR is a significant indicator that a large language model (LLM) truly understands the physical world. Previous question-answering datasets available for evaluating the ETR ability lack a systematic taxonomy and pay limited attention to compound questions. In this paper, we propose a unified taxonomy for event temporal questions and construct a comprehensive benchmark ETRQA, to evaluate the ETR abilities of LLMs based on this taxonomy. ETRQA not only inherits and expands the evaluation content of existing datasets but also contains multiple categories of compound questions. We evaluate two leading LLM series, Llama and Qwen, on ETRQA across various settings. Our experimental results indicate that large-scale LLMs exhibit certain ETR abilities. Yet they do not perform well in solving specific types of reasoning tasks, including reasoning involving time spans, reasoning for compound questions, and reasoning with fine temporal granularity. Additionally, we hope ETRQA can benefit the temporal reasoning research community for future studies.
2020
pdf
bib
abs
Extractive Financial Narrative Summarisation based on DPPs
Lei Li
|
Yafei Jiang
|
Yinan Liu
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
We participate in the FNS-Summarisation 2020 shared task to be held at FNP 2020 workshop at COLING 2020. Based on Determinantal Point Processes (DPPs), we build an extractive automatic financial summarisation system for the specific task. In this system, we first analyze the long report data to select the important narrative parts and generate an intermediate document. Next, we build the kernel Matrix L for the intermediate document, which represents the quality of its sentences. On the basis of L, we then can use the DPPs sampling algorithm to choose those sentences with high quality and diversity as the final summary sentences.
pdf
bib
abs
CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization
Lei Li
|
Yang Xie
|
Wei Liu
|
Yinan Liu
|
Yafei Jiang
|
Siya Qi
|
Xingyuan Li
Proceedings of the First Workshop on Scholarly Document Processing
Our system participates in two shared tasks, CL-SciSumm 2020 and LongSumm 2020. In the CL-SciSumm shared task, based on our previous work, we apply more machine learning methods on position features and content features for facet classification in Task1B. And GCN is introduced in Task2 to perform extractive summarization. In the LongSumm shared task, we integrate both the extractive and abstractive summarization ways. Three methods were tested which are T5 Fine-tuning, DPPs Sampling, and GRU-GCN/GAT.
2019
pdf
bib
abs
Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus
Wei Liu
|
Lei Li
|
Zuying Huang
|
Yinan Liu
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources
MultiLing 2019 Headline Generation Task on Wikipedia Corpus raised a critical and practical problem: multilingual task on low resource corpus. In this paper we proposed QDAS extractive summarization model enhanced by sentence2vec and try to apply transfer learning based on large multilingual pre-trained language model for Wikipedia Headline Generation task. We treat it as sequence labeling task and develop two schemes to handle with it. Experimental results have shown that large pre-trained model can effectively utilize learned knowledge to extract certain phrase using low resource supervised data.