Adaptive Backtracking for Privacy Protection in Large Language Models

Zhihao Yao; Yuxuan Gu; Xiachong Feng; Weitao Ma (马伟涛); Bo Li; Xiaocheng Feng (冯骁骋); Bing Qin (秦兵)

Adaptive Backtracking for Privacy Protection in Large Language Models

Zhihao Yao, Yuxuan Gu, Xiachong Feng, Weitao Ma, Bo Li, Xiaocheng Feng, Bing Qin

Abstract

The privacy leakage problem has become a critical topic in large language models, especially in the scenario of retrieval augmented generation.Current defense methods mitigate privacy leakage but are still suffering from the trade-off between privacy protection and response availability.To address the problem, we propose to explicitly capture the latent leakage tendency of LLM during the generation process, which is able to protect privacy from a more fundamental perspective.In detail, we propose ABack, a training-free mechanism that synchronously monitors the decoding steps, derives the initial leakage intention via modeling mental states, and rewrites the response with privacy awareness. In addition, we construct a new benchmark especially for personally identifiable information, considering the lack of formal privacy datasets.Experiments show that ABack improves privacy by up to 14% over strong baselines against adversarial attacks, avoiding the degradation of response utility.

Anthology ID:: 2026.findings-acl.1857
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 37278–37298
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1857/
DOI:
Bibkey:
Cite (ACL):: Zhihao Yao, Yuxuan Gu, Xiachong Feng, Weitao Ma, Bo Li, Xiaocheng Feng, and Bing Qin. 2026. Adaptive Backtracking for Privacy Protection in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 37278–37298, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Adaptive Backtracking for Privacy Protection in Large Language Models (Yao et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1857.pdf
Checklist:: 2026.findings-acl.1857.checklist.pdf

PDF Cite Search Checklist Fix data