Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs
Patrick Ahrend, Tobias Eder, Xiyang Yang, Zhiyi Pan, Georg Groh
Abstract
Chain-of-Thought (CoT) prompting improves LLM reasoning but can increase privacy risk by resurfacing personally identifiable information (PII) from the prompt into reasoning traces and outputs, even under policies that instruct the model not to restate PII. We study such direct, inference-time PII leakage using a model-agnostic framework that (i) defines leakage as risk-weighted, token-level events across 11 PII types, (ii) traces leakage curves as a function of the allowed CoT budget, and (iii) compares open- and closed-source model families on a structured PII dataset with a hierarchical risk taxonomy. We find that CoT consistently elevates leakage, especially for high-risk categories, and that leakage is strongly family- and budget-dependent: increasing the reasoning budget can either amplify or attenuate leakage depending on the base model. We then benchmark lightweight inference-time gatekeepers: a rule-based detector, a TF–IDF + logistic regression classifier, a GLiNER-based NER model, and an LLM-as-judge, using risk-weighted F1, Macro-F1, and recall. No single method dominates across models or budgets, motivating hybrid, style-adaptive gatekeeping policies that balance utility and risk under a common, reproducible protocol.- Anthology ID:
- 2026.privatenlp-main.10
- Volume:
- Proceedings of the Seventh Workshop on Privacy in Natural Language Processing
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California
- Editors:
- Ivan Habernal, Sepideh Ghanavati, Sara Haghighi, Krithika Ramesh, Timour Igamberdiev, Shomir Wilson
- Venues:
- PrivateNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 140–164
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.privatenlp-main.10/
- DOI:
- Cite (ACL):
- Patrick Ahrend, Tobias Eder, Xiyang Yang, Zhiyi Pan, and Georg Groh. 2026. Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs. In Proceedings of the Seventh Workshop on Privacy in Natural Language Processing, pages 140–164, San Diego, California. Association for Computational Linguistics.
- Cite (Informal):
- Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs (Ahrend et al., PrivateNLP 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.privatenlp-main.10.pdf