Robust Utility-Preserving Text Anonymization Based on Large Language Models

Tianyu Yang; Xiaodan Zhu; Iryna Gurevych

Robust Utility-Preserving Text Anonymization Based on Large Language Models

Tianyu Yang, Xiaodan Zhu, Iryna Gurevych

Abstract

Anonymizing text that contains sensitive information is crucial for a wide range of applications. Existing techniques face the emerging challenges of the re-identification ability of large language models (LLMs), which have shown advanced capability in memorizing detailed information and reasoning over dispersed pieces of patterns to draw conclusions. When defending against LLM-based re-identification, anonymization could jeopardize the utility of the resulting anonymized data in downstream tasks. In general, the interaction between anonymization and data utility requires a deeper understanding within the context of LLMs. In this paper, we propose a framework composed of three key LLM-based components: a privacy evaluator, a utility evaluator and an optimization component, which work collaboratively to perform anonymization. Extensive experiments demonstrate that the proposed model outperforms existing baselines, showing robustness in reducing the risk of re-identification while preserving greater data utility in downstream tasks. We provide detailed studies on these core modules. To consider large-scale and real-time applications, we investigate the distillation of the anonymization capabilities into lightweight models. All of our code and datasets will be made publicly available at [Github URL].

Anthology ID:: 2025.acl-long.1404
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28922–28941
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-long.1404/
DOI:
Bibkey:
Cite (ACL):: Tianyu Yang, Xiaodan Zhu, and Iryna Gurevych. 2025. Robust Utility-Preserving Text Anonymization Based on Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28922–28941, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Robust Utility-Preserving Text Anonymization Based on Large Language Models (Yang et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-long.1404.pdf

PDF Cite Search Fix data