NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human
Shuo Huang, William Maclean, Xiaoxi Kang, Qiongkai Xu, Zhuang Li, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu
Abstract
The widespread use of cloud-based Large Language Models (LLMs) has heightened concerns over user privacy, as sensitive information may be inadvertently exposed during interactions with these services. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined , through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works based on differential privacy, which lead to a sharp drop in information utility and unnatural texts, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments.- Anthology ID:
- 2025.findings-emnlp.476
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8954–8970
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.476/
- DOI:
- 10.18653/v1/2025.findings-emnlp.476
- Cite (ACL):
- Shuo Huang, William Maclean, Xiaoxi Kang, Qiongkai Xu, Zhuang Li, Xingliang Yuan, Gholamreza Haffari, and Lizhen Qu. 2025. NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 8954–8970, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human (Huang et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.476.pdf