Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs

Patrick Ahrend, Tobias Eder, Xiyang Yang, Zhiyi Pan, Georg Groh


Abstract
Chain-of-Thought (CoT) prompting improves LLM reasoning but can increase privacy risk by resurfacing personally identifiable information (PII) from the prompt into reasoning traces and outputs, even under policies that instruct the model not to restate PII. We study such direct, inference-time PII leakage using a model-agnostic framework that (i) defines leakage as risk-weighted, token-level events across 11 PII types, (ii) traces leakage curves as a function of the allowed CoT budget, and (iii) compares open- and closed-source model families on a structured PII dataset with a hierarchical risk taxonomy. We find that CoT consistently elevates leakage, especially for high-risk categories, and that leakage is strongly family- and budget-dependent: increasing the reasoning budget can either amplify or attenuate leakage depending on the base model. We then benchmark lightweight inference-time gatekeepers: a rule-based detector, a TF–IDF + logistic regression classifier, a GLiNER-based NER model, and an LLM-as-judge, using risk-weighted F1, Macro-F1, and recall. No single method dominates across models or budgets, motivating hybrid, style-adaptive gatekeeping policies that balance utility and risk under a common, reproducible protocol.
Anthology ID:
2026.privatenlp-main.10
Volume:
Proceedings of the Seventh Workshop on Privacy in Natural Language Processing
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Ivan Habernal, Sepideh Ghanavati, Sara Haghighi, Krithika Ramesh, Timour Igamberdiev, Shomir Wilson
Venues:
PrivateNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
140–164
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.privatenlp-main.10/
DOI:
Bibkey:
Cite (ACL):
Patrick Ahrend, Tobias Eder, Xiyang Yang, Zhiyi Pan, and Georg Groh. 2026. Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs. In Proceedings of the Seventh Workshop on Privacy in Natural Language Processing, pages 140–164, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs (Ahrend et al., PrivateNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.privatenlp-main.10.pdf