Robust Utility-Preserving Text Anonymization Based on Large Language Models

Tianyu Yang, Xiaodan Zhu, Iryna Gurevych


Abstract
Anonymizing text that contains sensitive information is crucial for a wide range of applications. Existing techniques face the emerging challenges of the re-identification ability of large language models (LLMs), which have shown advanced capability in memorizing detailed information and reasoning over dispersed pieces of patterns to draw conclusions. When defending against LLM-based re-identification, anonymization could jeopardize the utility of the resulting anonymized data in downstream tasks. In general, the interaction between anonymization and data utility requires a deeper understanding within the context of LLMs. In this paper, we propose a framework composed of three key LLM-based components: a privacy evaluator, a utility evaluator and an optimization component, which work collaboratively to perform anonymization. Extensive experiments demonstrate that the proposed model outperforms existing baselines, showing robustness in reducing the risk of re-identification while preserving greater data utility in downstream tasks. We provide detailed studies on these core modules. To consider large-scale and real-time applications, we investigate the distillation of the anonymization capabilities into lightweight models. All of our code and datasets will be made publicly available at [Github URL].
Anthology ID:
2025.acl-long.1404
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28922–28941
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.acl-long.1404/
DOI:
Bibkey:
Cite (ACL):
Tianyu Yang, Xiaodan Zhu, and Iryna Gurevych. 2025. Robust Utility-Preserving Text Anonymization Based on Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28922–28941, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Robust Utility-Preserving Text Anonymization Based on Large Language Models (Yang et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.acl-long.1404.pdf