Yanfu Zhang
2025
Controllable Memorization in LLMs via Weight Pruning
Chenjie Ni
|
Zhepeng Wang
|
Runxue Bao
|
Shangqian Gao
|
Yanfu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The evolution of pre-trained large language models (LLMs) has significantly transformed natural language processing. However, these advancements pose challenges, particularly the unintended memorization of training data, which raises ethical and privacy concerns. While prior research has largely focused on mitigating memorization or extracting memorized information, the deliberate control of memorization has been underexplored. This study addresses this gap by introducing a novel and unified gradient-based weight pruning framework to freely control memorization rates in LLMs. Our method enables fine-grained control over pruning parameters, allowing models to suppress or enhance memorization based on application-specific requirements. Experimental results demonstrate that our approach effectively balances the trade-offs between memorization and generalization, with an increase of up to 89.3% in Fractional ER suppression and 40.9% in Exact ER amplification compared to the original models.
Dynamic Retriever for In-Context Knowledge Editing via Policy Optimization
Mahmud Wasif Nafee
|
Maiqi Jiang
|
Haipeng Chen
|
Yanfu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) excel at factual recall yet still propagate stale or incorrect knowledge. In‐context knowledge editing offers a gradient-free remedy suitable for black-box APIs, but current editors rely on static demonstration sets chosen by surface-level similarity, leading to two persistent obstacles: (i) a quantity–quality trade-off, and (ii) lack of adaptivity to task difficulty. We address these issues by dynamically selecting supporting demonstrations according to their utility for the edit. We propose **D**ynamic **R**etriever for **I**n-Context **K**nowledge **E**diting (DR-IKE), a lightweight framework that (1) trains a BERT retriever with REINFORCE to rank demonstrations by editing reward, and (2) employs a *learnable threshold σ* to prune low-value examples, shortening the prompt when the edit is easy and expanding it when the task is hard. DR-IKE performs editing without modifying model weights, relying solely on forward passes for compatibility with black-box LLMs. On the CounterFact benchmark, it improves edit success by up to 17.1%, reduces latency by 41.6%, and preserves accuracy on unrelated queries—demonstrating scalable and adaptive knowledge editing.
2024
Unlocking Memorization in Large Language Models with Dynamic Soft Prompting
Zhepeng Wang
|
Runxue Bao
|
Yawen Wu
|
Jackson Taylor
|
Cao Xiao
|
Feng Zheng
|
Weiwen Jiang
|
Shangqian Gao
|
Yanfu Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Pretrained large language models (LLMs) have excelled in a variety of natural language processing (NLP) tasks, including summarization, question answering, and translation. However, LLMs pose significant security risks due to their tendency to memorize training data, leading to potential privacy breaches and copyright infringement. Therefore, accurate measurement of the memorization is essential to evaluate and mitigate these potential risks. However, previous attempts to characterize memorization are constrained by either using prefixes only or by prepending a constant soft prompt to the prefixes, which cannot react to changes in input. To address this challenge, we propose a novel method for estimating LLM memorization using dynamic, prefix-dependent soft prompts. Our approach involves training a transformer-based generator to produce soft prompts that adapt to changes in input, thereby enabling more accurate extraction of memorized data. Our method not only addresses the limitations of previous methods but also demonstrates superior performance in diverse experimental settings compared to state-of-the-art techniques. In particular, our method can achieve the maximum relative improvement of 135.3% and 39.8% over the vanilla baseline on average in terms of *discoverable memorization rate* for the text generation task and code generation task, respectively. Our code is available at https://github.com/wangger/llm-memorization-dsp.
Search
Fix author
Co-authors
- Runxue Bao 2
- Shangqian Gao 2
- Zhepeng Wang 2
- Haipeng Chen 1
- Weiwen Jiang 1
- show all...