Ivoline C. Ngong
2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
Ivoline C. Ngong
|
Swanand Ravindra Kadhe
|
Hao Wang
|
Keerthiram Murugesan
|
Justin D. Weisz
|
Amit Dhurandhar
|
Karthikeyan Natesan Ramamurthy
Findings of the Association for Computational Linguistics: ACL 2025
Conversational agents are increasingly woven into individuals’ personal lives, yet users often underestimate the privacy risks associated with them. The moment users share information with these agents —such as large language models (LLMs)— their private information becomes vulnerable to exposure. In this paper, we characterize the notion of contextual privacy for user interactions with LLM-based Conversational Agents (LCAs). It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals when interacting with LCAs (untrusted receivers). Through a formative design user study, we observe how even “privacy-conscious” users inadvertently reveal sensitive information through indirect disclosures. Based on insights from this study, we propose a locally deployable framework that operates between users and LCAs, identifying and reformulating out-of-context information in user prompts. Our evaluation using examples from ShareGPT shows that lightweight models can effectively implement this framework, achieving strong gains in contextual privacy while preserving the user’s intended interaction goals. Notably, about 76% of participants in our human evaluation preferred the reformulated prompts over the original ones, validating the usability and effectiveness of contextual privacy in our proposed framework. We open source the code at https://github.com/IBM/contextual-privacy-LLM.
Differentially Private Learning Needs Better Model Initialization and Self-Distillation
Ivoline C. Ngong
|
Joseph Near
|
Niloofar Mireshghallah
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Differentially private SGD (DPSGD) enables privacy-preserving training of language models, but often reduces utility, diversity, and linguistic quality. We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs. This approach significantly outperforms vanilla DPSGD, with AlpacaEval preferring DPRefine’s generations in 78.38% of cases across all datasets and metrics, while also demonstrating substantial improvements in lexical diversity, achieving 85.31% in MSTTR and 86.82% in Jaccard similarity. Our fine-grained analysis reveals that DPRefine reduces linguistic errors in generated text by 84%, mitigating grammar errors, spelling mistakes, and missing punctuation commonly associated with DPSGD. It also reduces inconsistencies present in non-private models, such as fabricated details and misattributed quotes. We find that small models like GPT-2 and T5 are effective for initialization and distillation, highlighting their potential in enabling scalable and efficient deployment of high-performing, privacy-preserving language models with improved linguistic quality and consistency.