Shengfei Lyu

2025

pdf bib abs
Generation-Augmented and Embedding Fusion in Document-Level Event Argument Extraction
Xingjian Lin | Shengfei Lyu | Xin Wang | Qiuju Chen | Huanhuan Chen
Proceedings of the 31st International Conference on Computational Linguistics

Document-level event argument extraction is a crucial task that aims to extract arguments from the entire document, beyond sentence-level analysis. Prior classification-based models still fail to explicitly capture significant relationships and heavily relies on large-scale datasets. In this study, we propose a novel approach called Generation-Augmented and Embedding Fusion. This approach first uses predefined templates and generative language models to produce an embedding capturing role relationship information, then integrates it into the foundational embedding derived from a classification model through a noval embedding fusion mechanism. We conduct the extensive experiments on the RAMS and WikiEvents datasets to demonstrate that our approach is more effective than the baselines, and that it is also data-efficient in low-resource scenarios.

We introduce ESGenius, a comprehensive benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social, and Governance (ESG) and sustainability-focused question answering. ESGenius comprises two key components: (i) ESGenius-QA, a collection of 1,136 Multiple-Choice Questions (MCQs) generated by LLMs and rigorously validated by domain experts, covering a broad range of ESG pillars and sustainability topics. Each question is systematically linked to its corresponding source text, enabling transparent evaluation and supporting Retrieval-Augmented Generation (RAG) methods; and (ii) ESGenius-Corpus, a meticulously curated repository of 231 foundational frameworks, standards, reports, and recommendation documents from 7 authoritative sources. Moreover, to fully assess the capabilities and adaptation potential of LLMs, we implement a rigorous two-stage evaluation protocol—Zero-Shot and RAG. Extensive experiments across 50 LLMs (0.5B to 671B) demonstrate that state-of-the-art models achieve only moderate performance in zero-shot settings, with accuracies around 55–70%, highlighting a significant knowledge gap for LLMs in this specialized, interdisciplinary domain. However, models employing RAG demonstrate significant performance improvements, particularly for smaller models. For example, DeepSeek-R1-Distill-Qwen-14B improves from 63.82% (zero-shot) to 80.46% with RAG. These results demonstrate the necessity of grounding responses in authoritative sources for enhanced ESG understanding. To the best of our knowledge, ESGenius is the first comprehensive QA benchmark designed to rigorously evaluate LLMs on ESG and sustainability knowledge, providing a critical tool to advance trustworthy AI in this vital domain.

Shengfei Lyu

2025

2021

Co-authors

Venues