@inproceedings{awon-etal-2025-clusant,
    title = "{C}lu{S}an{T}: Differentially Private and Semantically Coherent Text Sanitization",
    author = "Awon, Ahmed Musa  and
      Lu, Yun  and
      Potka, Shera  and
      Thomo, Alex",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.naacl-long.187/",
    doi = "10.18653/v1/2025.naacl-long.187",
    pages = "3676--3693",
    ISBN = "979-8-89176-189-6",
    abstract = "We introduce CluSanT, a novel text sanitization framework based on Metric Local Differential Privacy (MLDP). Our framework consists of three components: token clustering, cluster embedding, and token sanitization. For the first, CluSanT employs Large Language Models (LLMs) to create{---}a set of potential substitute tokens which we meaningfully cluster. Then, we develop a parameterized cluster embedding that balances the trade-off between privacy and utility. Lastly, we propose a MLDP algorithm which sanitizes/substitutes sensitive tokens in a text with the help of our embedding. Notably, our MLDP-based framework can be tuned with parameters such that (1) existing state-of-the-art (SOTA) token sanitization algorithms can be described{---}and improved{---}via our framework with extremal values of our parameters, and (2) by varying our parameters, we allow for a whole spectrum of privacy-utility tradeoffs between the two SOTA. Our experiments demonstrate CluSanT{'}s balance between privacy and semantic coherence, highlighting its capability as a valuable framework for privacy-preserving text sanitization."
}Markdown (Informal)
[CluSanT: Differentially Private and Semantically Coherent Text Sanitization](https://preview.aclanthology.org/ingest-emnlp/2025.naacl-long.187/) (Awon et al., NAACL 2025)
ACL
- Ahmed Musa Awon, Yun Lu, Shera Potka, and Alex Thomo. 2025. CluSanT: Differentially Private and Semantically Coherent Text Sanitization. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3676–3693, Albuquerque, New Mexico. Association for Computational Linguistics.