Watermarking Large Language Models: An Unbiased and Low-risk Method

Minjia Mao; Dongjun Wei; Zeyu Chen; Xiao Fang; Michael Chau

Watermarking Large Language Models: An Unbiased and Low-risk Method

Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, Michael Chau

Abstract

Recent advancements in large language models (LLMs) have highlighted the risk of misusing them, raising the need for accurate detection of LLM-generated content. In response, a viable solution is to inject imperceptible identifiers into LLMs, known as watermarks. Our research extends the existing watermarking methods by proposing the novel Sampling One Then Accepting (STA-1) method. STA-1 is an unbiased watermark that preserves the original token distribution in expectation and has a lower risk of producing unsatisfactory outputs in low-entropy scenarios compared to existing unbiased watermarks. In watermark detection, STA-1 does not require prompts or a white-box LLM, provides statistical guarantees, demonstrates high efficiency in detection time, and remains robust against various watermarking attacks. Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves the above properties simultaneously, making it a desirable solution for watermarking LLMs. Implementation codes for this study are available online.

Anthology ID:: 2025.acl-long.391
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7939–7960
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.391/
DOI:
Bibkey:
Cite (ACL):: Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, and Michael Chau. 2025. Watermarking Large Language Models: An Unbiased and Low-risk Method. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7939–7960, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Watermarking Large Language Models: An Unbiased and Low-risk Method (Mao et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.391.pdf

PDF Cite Search Fix data