2025
pdf
bib
abs
DocMEdit: Towards Document-Level Model Editing
Li Zeng
|
Zeming Liu
|
Chong Feng
|
Heyan Huang
|
Yuhang Guo
Findings of the Association for Computational Linguistics: ACL 2025
Model editing aims to correct errors and outdated knowledge in the Large language models (LLMs) with minimal cost. Prior research has proposed a variety of datasets to assess the effectiveness of these model editing methods. However, most existing datasets only require models to output short phrases or sentences, overlooks the widespread existence of document level tasks in the real world, raising doubts about their practical usability. Aimed at addressing this limitation and promoting the application of model editing in real-world scenarios, we propose the task of document-level model editing. To tackle such challenges and enhance model capabilities in practical settings, we introduce DocMEdit, a dataset focused on document-level model editing, characterized by document-level inputs and outputs, extrapolative, and multiple facts within a single edit. We propose a series of evaluation metrics and experiments. The results show that the difficulties in document-level model editing pose challenges for existing model editing methods.
2024
pdf
bib
abs
EFSA: Towards Event-Level Financial Sentiment Analysis
Tianyu Chen
|
Yiming Zhang
|
Guoxin Yu
|
Dapeng Zhang
|
Li Zeng
|
Qing He
|
Xiang Ao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this paper, we extend financial sentiment analysis (FSA) to event-level since events usually serve as the subject of the sentiment in financial text. Though extracting events from the financial text may be conducive to accurate sentiment predictions, it has specialized challenges due to the lengthy and discontinuity of events in a financial text. To this end, we reconceptualize the event extraction as a classification task by designing a categorization comprising coarse-grained and fine-grained event categories. Under this setting, we formulate the Event-Level Financial Sentiment Analysis(EFSA for short) task that outputs quintuples consisting of (company, industry, coarse-grained event, fine-grained event, sentiment) from financial text. A large-scale Chinese dataset containing 12,160 news articles and 13,725 quintuples is publicized as a brand new testbed for our task. A four-hop Chain-of-Thought LLM-based approach is devised for this task. Systematically investigations are conducted on our dataset, and the empirical results demonstrate the benchmarking scores of existing methods and our proposed method can reach the current state-of-the-art. Our dataset and framework implementation are available at https://github.com/cty1934/EFSA
pdf
bib
abs
FAME: Towards Factual Multi-Task Model Editing
Li Zeng
|
Yingyu Shan
|
Zeming Liu
|
Jiashu Yao
|
Yuhang Guo
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) embed extensive knowledge and utilize it to perform exceptionally well across various tasks. Nevertheless, outdated knowledge or factual errors within LLMs can lead to misleading or incorrect responses, causing significant issues in practical applications. To rectify the fatal flaw without the necessity for costly model retraining, various model editing approaches have been proposed to correct inaccurate information within LLMs in a cost-efficient way. To evaluate these model editing methods, previous work introduced a series of datasets. However, most of the previous datasets only contain fabricated data in a single format, which diverges from real-world model editing scenarios, raising doubts about their usability in practice. To facilitate the application of model editing in real-world scenarios, we propose the challenge of practicality. To resolve such challenges and effectively enhance the capabilities of LLMs, we present FAME, an authentic, comprehensive, and multi-task dataset, which is designed to enhance the practicality of model editing. We then propose SKEME, a model editing method that uses a novel caching mechanism to ensure synchronization with the real world. The experiments demonstrate that our method performs excellently across various tasks and scenarios, confirming its practicality.
pdf
bib
abs
DRAMA: Dynamic Multi-Granularity Graph Estimate Retrieval over Tabular and Textual Question Answering
Ruize Yuan
|
Xiang Ao
|
Li Zeng
|
Qing He
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
The TableTextQA task requires finding the answer to the question from a combination of tabular and textual data, which has been gaining increasing attention. The row-based approaches have demonstrated remarkable effectiveness. However, they suffer from the following limitations: (1) a lack of interaction between rows; (2) excessively long input lengths; and (3) question attention shifts in the multi-hop QA task. To this end, we propose a novel method: Dynamic Multi-Granularity Graph Estimate Retrieval - DRAMA. Our method incorporates an interaction mechanism among multiple rows. Specifically, we utilize a memory bank to store the features of each row, thereby facilitating the construction of a heterogeneous graph with multi-row information. Besides, a Dynamic Graph Attention Network (DGAT) module is engaged to gauge the attention shift in the multi-hop question and eliminate the noise information dynamically. Empirical results on the widely used HybridQA and TabFact datasets demonstrate that the proposed model is effective.
2023
pdf
bib
abs
BIT-ACT: An Ancient Chinese Translation System Using Data Augmentation
Li Zeng
|
Yanzhi Tian
|
Yingyu Shan
|
Yuhang Guo
Proceedings of ALT2023: Ancient Language Translation Workshop
This paper describes a translation model for ancient Chinese to modern Chinese and English for the Evahan 2023 competition, a subtask of the Ancient Language Translation 2023 challenge. During the training of our model, we applied various data augmentation techniques and used SiKu-RoBERTa as part of our model architecture. The results indicate that back translation improves the model’s performance, but double back translation introduces noise and harms the model’s performance. Fine-tuning on the original dataset can be helpful in solving the issue.