DPED: Multi-Layer Noise Distillation for Privacy-Preserving Text Embeddings

Shuya Feng; Yuan Hong

DPED: Multi-Layer Noise Distillation for Privacy-Preserving Text Embeddings

Abstract

Training text embedding models under differential privacy constraints is challenging due to the high dimensionality of language data and the presence of rare, identifying linguistic features. We propose (Differentially Private Embedding Distillation), a framework that leverages teacher-student distillation with multi-layer noise injection to learn high-quality embeddings while providing differential privacy guarantees. DPED trains an ensemble of teacher models on disjoint subsets of sensitive text data, then transfers their knowledge to a student model through noisy aggregation at multiple layers. A rare-word-aware strategy adaptively handles infrequent words, improving privacy-utility trade-offs. Experiments on benchmark datasets demonstrate that DPED outperforms standard differentially private training methods, achieving substantially higher utility at the same privacy budget. Our approach protects individual word usage patterns in training documents, preventing models from memorizing unique linguistic fingerprints while maintaining practical utility for downstream NLP tasks. Source code is available at https://github.com/datasec-lab/DPED.

Anthology ID:: 2025.emnlp-main.1282
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25248–25256
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1282/
DOI:
Bibkey:
Cite (ACL):: Shuya Feng and Yuan Hong. 2025. DPED: Multi-Layer Noise Distillation for Privacy-Preserving Text Embeddings. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25248–25256, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: DPED: Multi-Layer Noise Distillation for Privacy-Preserving Text Embeddings (Feng & Hong, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1282.pdf
Checklist:: 2025.emnlp-main.1282.checklist.pdf

PDF Cite Search Checklist Fix data