Inspecting the Representation Manifold of Differentially-Private Text

Stefan Arnold


Abstract
Differential Privacy (DP) for text has recently taken the form of text paraphrasing using language models and temperature sampling to better balance privacy and utility. However, the geometric distortion of DP regarding the structure and complexity in the representation space remains unexplored. By estimating the intrinsic dimension of paraphrased text across varying privacy budgets, we find that word-level methods severely raise the representation manifold, while sentence-level methods produce paraphrases whose manifolds are topologically more consistent with human-written paraphrases. Among sentence-level methods, masked paraphrasing, compared to causal paraphrasing, demonstrates superior preservation of structural complexity, suggesting that autoregressive generation propagates distortions from unnatural word choices that cascade and inflate the representation space.
Anthology ID:
2025.privatenlp-main.5
Volume:
Proceedings of the Sixth Workshop on Privacy in Natural Language Processing
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Ivan Habernal, Sepideh Ghanavati, Vijayanta Jain, Timour Igamberdiev, Shomir Wilson
Venues:
PrivateNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–59
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.privatenlp-main.5/
DOI:
Bibkey:
Cite (ACL):
Stefan Arnold. 2025. Inspecting the Representation Manifold of Differentially-Private Text. In Proceedings of the Sixth Workshop on Privacy in Natural Language Processing, pages 53–59, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Inspecting the Representation Manifold of Differentially-Private Text (Arnold, PrivateNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.privatenlp-main.5.pdf