Zhihao Yao

2026

The privacy leakage problem has become a critical topic in large language models, especially in the scenario of retrieval augmented generation.Current defense methods mitigate privacy leakage but are still suffering from the trade-off between privacy protection and response availability.To address the problem, we propose to explicitly capture the latent leakage tendency of LLM during the generation process, which is able to protect privacy from a more fundamental perspective.In detail, we propose ABack, a training-free mechanism that synchronously monitors the decoding steps, derives the initial leakage intention via modeling mental states, and rewrites the response with privacy awareness. In addition, we construct a new benchmark especially for personally identifiable information, considering the lack of formal privacy datasets.Experiments show that ABack improves privacy by up to 14% over strong baselines against adversarial attacks, avoiding the degradation of response utility.

Co-authors

Bing Qin (秦兵) 1

Venues

Findings1

Fix author