WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness

Baizhou Huang; Xiaojun Wan

WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness

Abstract

Watermarking is a prominent technique to trace the usage of specific large language models (LLMs) by injecting patterns into model-generated content. An ideal watermark should be imperceptible, easily detectable, and robust to text alterations, yet existing methods typically face trade-offs among these properties. This paper utilizes a key-centered scheme to unify existing methods by decomposing a watermark into two components: a key module and a mark module. We show that the trade-off issue is the reflection of the conflict between the scale of the key sampling space during generation and the complexity of key restoration during detection within the key module. To this end, we introduce WaterPool, a simple yet effective key module that preserves a complete key sampling space for imperceptibility while utilizing semantics-based search to improve the key restoration process. WaterPool can integrate seamlessly with existing watermarking techniques, significantly enhancing their performance, achieving near-optimal imperceptibility, and markedly improving their detection efficacy and robustness (+12.73% for KGW, +20.27% for EXP, +7.27% for ITS).

Anthology ID:: 2025.naacl-long.209
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4156–4182
Language:
URL:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-long.209/
DOI:
Bibkey:
Cite (ACL):: Baizhou Huang and Xiaojun Wan. 2025. WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4156–4182, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness (Huang & Wan, NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-long.209.pdf

PDF Cite Search Fix data