Proceedings of the Seventh Workshop on Privacy in Natural Language Processing
Ivan Habernal, Sepideh Ghanavati, Sara Haghighi, Krithika Ramesh, Timour Igamberdiev, Shomir Wilson (Editors)
- Anthology ID:
- 2026.privatenlp-main
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California
- Venues:
- PrivateNLP | WS
- Events:
- Annual Meeting of the Association for Computational Linguistics (2026) | Workshop on Privacy in NLP (2026) | Other Workshops and Events (2026)
- SIG:
- Publisher:
- Association for Computational Linguistics
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.privatenlp-main/
- DOI:
- ISBN:
- 979-8-89176-397-5
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.privatenlp-main.pdf
Proceedings of the Seventh Workshop on Privacy in Natural Language Processing
Ivan Habernal | Sepideh Ghanavati | Sara Haghighi | Krithika Ramesh | Timour Igamberdiev | Shomir Wilson
Ivan Habernal | Sepideh Ghanavati | Sara Haghighi | Krithika Ramesh | Timour Igamberdiev | Shomir Wilson
From Conventional Web Privacy to Agentic Disclosure: How Tool Schemas May Invite LLM Oversharing
Shahriar Shayesteh | Shomir Wilson
Shahriar Shayesteh | Shomir Wilson
LLM agents increasingly act on behalf of users by selecting tools and constructing API requests to external services. This creates a new privacy risk in agentic systems: disclosure is no longer limited to what users directly enter into a form, but can instead be generated by the agent at runtime. In conventional web settings, disclosure is largely bounded by the user-facing interface, and what is appropriate to share varies across service contexts. In tool-using agents, however, disclosure is generated at runtime when user intent is translated into tool-call arguments for a particular receiving service, making context-sensitive disclosure boundaries harder to preserve. In this position paper, we argue that the runtime tool call is the key unit of privacy analysis in agentic systems. Our contribution is diagnostic rather than behavioral: instead of measuring realized leakage, we analyze interface conditions that may make agent oversharing more plausible. In particular, schemas that expose generic, weakly constrained free-text fields leave part of disclosure under agent discretion. In a case study of 2,344 tool specifications from the OpenAI GPT ecosystem, we find that 36.9% expose at least one such channel, creating conditions for within-context over-disclosure, cross-context leakage, and what we call contextual flattening. We conclude by outlining a research agenda for NLP that moves beyond output-only evaluation toward argument-level analysis of what tool schemas allow agents to send to third-party services.
The Challenge of Identifying the Origin of Black-Box Large Language Models
Ziqing Yang | Yixin Wu | Yun Shen | Wei Dai | Michael Backes | Yang Zhang
Ziqing Yang | Yixin Wu | Yun Shen | Wei Dai | Michael Backes | Yang Zhang
The tremendous commercial potential of large language models (LLMs) has heightened concerns over their unauthorized use. To address this, we focus on the task of identifying the origin of black-box LLMs. We further propose PlugAE, an effective and efficient identification method that proactively leverages LLM-specific adversarial embeddings and allows users to customize copyright tokens on a targeted query set. Extensive experiments demonstrate that PlugAE outperforms both state-of-the-art model watermarking and fingerprinting methods in accuracy and robustness. We further analyze its stealthiness and reliability from three complementary perspectives and conduct ablation studies under various configurations, confirming its practicality for real-world misuse detection.
SecureLLM: Using Inference-time Compositionality to Build Secure Language Models
Abdulrahman Alabdulkareem | Christian Michael Arnold | Yerim Lee | Pieter M Feenstra | Conner Arnold | Boris Katz | Andrei Barbu | Brian Cheung
Abdulrahman Alabdulkareem | Christian Michael Arnold | Yerim Lee | Pieter M Feenstra | Conner Arnold | Boris Katz | Andrei Barbu | Brian Cheung
As Large Language Models (LLMs) increasingly support critical sectors such as healthcare, finance, and public governance, ensuring data confidentiality and robust access control is a pressing societal challenge. Traditional security mechanisms isolate sensitive resources from unauthorized users, yet existing LLM safety approaches often fail to enforce strict segregation of confidential data. In this work, we introduce SecureLLM, a novel compositional framework for building secure LLMs that integrates fine-tuning with traditional access security measures to protect private information. By fine-tuning LLMs on segregated, “siloed” training data and composing their outputs at inference time based solely on a user’s verified credentials, SecureLLM not only prevents unauthorized data leakage but also enables accurate responses for complex queries spanning multiple data silos. Our method is demonstrated on a challenging natural-language-to-SQL translation task and is designed with real-world applications in mind, where protecting sensitive information is critical.
STAMP-R: Stylometric Text Anonymization with Memory-guided Policy Rewriting
Zhan Shi | Yefeng Yuan | Liang Cheng | Yuhong Liu
Zhan Shi | Yefeng Yuan | Liang Cheng | Yuhong Liu
Modern machine learning systems rely heavily on large-scale textual data that often contain sensitive personal information. Although conventional anonymization techniques remove explicit identifiers, textual data remain vulnerable to authorship inference attacks that exploit persistent stylometric signals.Recent approaches leverage Large Language Models (LLMs) to rewrite text and obscure such signals, but they frequently overlook distinctive stylometric outliers and fail to achieve a favorable privacy–utility trade-off due to rigid, one-size-fits-all obfuscation strategies, while also incurring high computational costs.To address these challenges, we propose STAMP-R, a risk-adaptive reinforcement learning framework for instance-level authorship anonymization. We formulate anonymization as a risk-aware, instance-level style distribution shaping problem. Central to our approach is the Style Manifold Memory (SMM), which models the global stylistic landscape via prototype-based density estimation. SMM detects high-risk stylometric outliers and adaptively modulates a composite reward function, enabling stronger obfuscation for highly identifiable samples while preserving semantic fidelity for low-risk instances.We further distill a lightweight 3B-parameter model from a teacher LLM for efficient local deployment. Experiments show that STAMP-R reduces authorship re-identification risk while maintaining strong downstream utility.
Loss Masking Under the Hood: Backdoor Concealment and Private Data Memorization in LLMs
Tagore Rao Kosireddy | Evan Lucas
Tagore Rao Kosireddy | Evan Lucas
Loss masking has been proposed as a method for preventing language models from generating specific content by selectively zeroes the training loss on sensitive tokens,which allows a language model to learn protected content as contextwithout learning to reproduce it (CITATION).% Although promising, many critical questions about the impacts to a model remain unanswered. In this work, we investigate the impact of loss masking on internal model representation and context understanding using a small causal language model (GPT-2) at three scales (124M, 355M, 774M parameters) and apply mechanistic interpretability tools including causal tracing, attention analysis, and linear probing. We explore two use cases of loss-masking: backdoor concealment and prevention of memorization of named entities. In both settings, we find that loss masking successfully blocks generation of the protected tokens. Through mechanistic analysis, we show that protected token identity remains fully encoded in hidden states regardless of loss masking, confirming that loss masking suppresses the output pathway but not the internal encoding. Code is available at https://github.com/Tagore-7/loss-masking-analysis
Prompt Stylometry for On-Device Affect-Adaptive AI: A Feasibility Study in Linguistic Signal Detection and Response Steering
Debmalya Pal
Debmalya Pal
Every user prompt contains latent linguistic signals beyond its explicit semantic content: lexical choice, hedging, sentence structure, and discourse patterns, that reflect the user’s affective state and cognitive style. Yet most large language models are optimized for generalized assistant behavior rather than explicit adaptation to these fine-grained signals. We introduce Prompt Stylometry, a framework for detecting affective and cognitive-style signals directly from user prompts and using them to steer response generation. We study two categories of signals: affect-related cues associated with emotional states, and cognitive-style cues associated with patterns such as analytical, exploratory, self-critical, or indecisive reasoning. This inference capability, however, creates substantial privacy risks: any system processing prompts server-side could implicitly profile users’ psychological states without their knowledge or consent. This motivates our core design choice of a fully on-device architecture in which no interaction data leaves the user’s device. We benchmark three annotation paradigms, lexicon-based, neural, and generative, across 600 synthetic prompts spanning 30 stylometric profiles, and evaluate affect-adaptive response steering across two small language model families under 5B parameters. Our results show systematic differences in both signal detection behavior and downstream steering responsiveness across annotation methods and model families, demonstrating the feasibility of privacy-preserving affect-adaptive AI on consumer hardware while identifying annotation paradigm sensitivity and cross-profile transfer as key open challenges.
Differential Privacy (DP) for text matured from disjointed word-level substitutions to contiguous sentence-level rewriting by leveraging the generative capacity of language models. While this form of text privatization is best suited for balancing formal privacy guarantees with grammatical coherence, its impact on the register identity of text remains largely unexplored. By conducting a multidimensional stylistic profiling of differentially-private rewriting, we demonstrate that the cost of privacy extends far beyond lexical variation. Specifically, we find that rewriting under privacy constraints induces a systematic functional mutation of the text’s communicative signature. This shift is characterized by the severe attrition of interactive markers, contextual references, and complex subordination. By comparing autoregressive paraphrasing against bidirectional substitution across a spectrum of privacy budgets, we observe that both architectures force convergence toward a non-involved and non-persuasive register. This register-blind sanitization effectively preserves semantic content but structurally homogenizes the nuanced stylistic markers that define human-authored discourse.
Privacy-preserving natural language processing (NLP) typically focuses on removing explicit identifiers such as names, addresses, and phone numbers. We argue that this approach overlooks a key risk: natural language itself encodes signals about a speaker’s geographic origin, social background, and community membership that persist after anonymization. We introduce Linguistic Identity Leakage (LIL), defined as the inference of personal or demographic attributes from linguistic features in text where explicit identifiers have been removed. We further introduce Linguistic Personally Identifiable Information (L-PII) to denote the linguistic features that enable such inference. Drawing on sociolinguistics, stylometry, and NLP privacy research, we propose a taxonomy of linguistic identity signals across five categories and examine implications for dataset release, language model training, and privacy auditing. Using examples from Arabic dialectal variation and other multilingual contexts, we present the Identity Inference Risk (IIR) framework for assessing residual privacy risk in NLP systems and discuss how contemporary LLMs amplify these risks. Our goal is to encourage broader recognition of the gap between conventional anonymization practices and the linguistic reality of natural language data.
A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation
Stephen Meisenbacher | Angelo Kleinert | Florian Matthes
Stephen Meisenbacher | Angelo Kleinert | Florian Matthes
The goal of *differentially private text obfuscation* is to obfuscate, or "perturb", input texts with Differential Privacy (DP) guarantees, such that the private output texts are quantifiably indistinguishable from the originals. While perturbation at the word level is intuitive, meaningful text privatization happens on complete documents. Recent research has laid the groundwork for reasoning about *privacy budget distribution*, namely, how an overall 𝜀 budget can be sensibly distributed among the component pieces of a text. We perform a systematic evaluation of multiple text decomposition and budget distribution techniques in the context of DP text obfuscation, testing how different methods for chunking texts can be combined with techniques for allocating 𝜀 to these chunks. Our experiments reveal that such design choices are very important, as even with comparable privacy budgets, significantly different results can occur based on which methods are chosen. In this, we provide credible evidence of the feasibility of maximizing empirical trade-offs by optimizing DP obfuscation procedures.
Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs
Patrick Ahrend | Tobias Eder | Xiyang Yang | Zhiyi Pan | Georg Groh
Patrick Ahrend | Tobias Eder | Xiyang Yang | Zhiyi Pan | Georg Groh
Chain-of-Thought (CoT) prompting improves LLM reasoning but can increase privacy risk by resurfacing personally identifiable information (PII) from the prompt into reasoning traces and outputs, even under policies that instruct the model not to restate PII. We study such direct, inference-time PII leakage using a model-agnostic framework that (i) defines leakage as risk-weighted, token-level events across 11 PII types, (ii) traces leakage curves as a function of the allowed CoT budget, and (iii) compares open- and closed-source model families on a structured PII dataset with a hierarchical risk taxonomy. We find that CoT consistently elevates leakage, especially for high-risk categories, and that leakage is strongly family- and budget-dependent: increasing the reasoning budget can either amplify or attenuate leakage depending on the base model. We then benchmark lightweight inference-time gatekeepers: a rule-based detector, a TF–IDF + logistic regression classifier, a GLiNER-based NER model, and an LLM-as-judge, using risk-weighted F1, Macro-F1, and recall. No single method dominates across models or budgets, motivating hybrid, style-adaptive gatekeeping policies that balance utility and risk under a common, reproducible protocol.