Wang Yuanlong
Also published as: Wang Yuanlong
2025
Memorization ≠ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
Boxiang Ma
|
Ru Li
|
Wang Yuanlong
|
Hongye Tan
|
Xiaoli Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Driven by vast and diverse textual data, large language models (LLMs) have demonstrated impressive performance across numerous natural language processing (NLP) tasks. Yet, a critical question persists: does their generalization arise from mere memorization of training data or from deep semantic understanding? To investigate this, we propose a bi-perspective evaluation framework to assess LLMs’ scenario cognition—the ability to link semantic scenario elements with their arguments in context. Specifically, we introduce a novel scenario-based dataset comprising diverse textual descriptions of fictional facts, annotated with scenario elements. LLMs are evaluated through their capacity to answer scenario-related questions (model output perspective) and via probing their internal representations for encoded scenario elements-argument associations (internal representation perspective). Our experiments reveal that current LLMs predominantly rely on superficial memorization, failing to achieve robust semantic scenario cognition, even in simple cases. These findings expose critical limitations in LLMs’ semantic understanding and offer cognitive insights for advancing their capabilities.
2023
Improving Sequential Model Editing with Fact Retrieval
Xiaoqi Han
|
Ru Li
|
Hongye Tan
|
Wang Yuanlong
|
Qinghua Chai
|
Jeff Pan
Findings of the Association for Computational Linguistics: EMNLP 2023
The task of sequential model editing is to fix erroneous knowledge in Pre-trained Language Models (PLMs) efficiently, precisely and continuously. Although existing methods can deal with a small number of modifications, these methods experience a performance decline or require additional annotated data, when the number of edits increases. In this paper, we propose a Retrieval Augmented Sequential Model Editing framework (RASE) that leverages factual information to enhance editing generalization and to guide the identification of edits by retrieving related facts from the fact-patch memory we constructed. Our main findings are: (i) State-of-the-art models can hardly correct massive mistakes stably and efficiently; (ii) Even if we scale up to thousands of edits, RASE can significantly enhance editing generalization and maintain consistent performance and efficiency; (iii) RASE can edit large-scale PLMs and increase the performance of different editors. Moreover, it can integrate with ChatGPT and further improve performance. Our code and data are available at: https://github.com/sev777/RASE.
Search
Fix author
Co-authors
- Ru Li (李茹) 2
- Hongye Tan (谭红叶) 2
- Qinghua Chai (柴清华) 1
- Xiaoqi Han (韩孝奇) 1
- Xiaoli Li 1
- show all...