Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang; Yubing Ren; Yanan Cao; Fang Fang; Xiaoxue Li; Li Guo

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang, Yubing Ren, Yanan Cao, Fang Fang, Xiaoxue Li, Li Guo

Abstract

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.

Anthology ID:: 2026.findings-acl.990
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19773–19790
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.990/
DOI:
Bibkey:
Cite (ACL):: Zhuoshang Wang, Yubing Ren, Yanan Cao, Fang Fang, Xiaoxue Li, and Li Guo. 2026. Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework. In Findings of the Association for Computational Linguistics: ACL 2026, pages 19773–19790, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework (Wang et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.990.pdf
Checklist:: 2026.findings-acl.990.checklist.pdf

PDF Cite Search Checklist Fix data