Ke Yang

Other people with similar names: Ke Yang, Ke Yang

Unverified author pages with similar names: Ke Yang

2026

You Can Have a Second Chance: Unbiased and Multi-bit Watermarking for Diffusion Language Models with Regret-based Remasking
Ke Yang | Dongyang Liang | Jing Yu | Shuguang Yuan | Chi Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The rapid development of Diffusion Language Models (DLMs) raises concerns about watermarking for DLM-generated detection. However, existing sequential LLM watermarking cannot be directly applied to DLMs, as DLMs’ generation order is arbitrary. While emerging studies adapt biased LLM watermarking to DLMs by temporarily predicting the watermark prefix, they suffer from degraded quality and unstable watermarking due to bias accumulation and prediction errors. Besides, they cannot carry multi-bit watermarks. In this paper, we propose unbiased multi-bit watermarking for DLMs. We introduce a stability-aware constraint that allows watermarking only in stable contexts and a bit-controlled, unbiased modulation to preserve the original DLM output distribution, achieving stable watermarking with minimal quality impact. To enhance detection robustness, we design a Regret-based Remasking, which grants a “second chance” for unwatermarked tokens to be regenerated. It can seamlessly integrate into DLM inference with no added diffusion steps and latency. Experiments across DLMs and various tasks show that our scheme is effective, achieving superior generation quality compared to baselines while maintaining high detection accuracy and multi-bit capacity. Our code is available here https://github.com/iieSKLCSDsg/UMR.

pdf bib abs

Knowledge-Infused Multi-Bit Watermarking for RAG Knowledge Bases
Ke Yang | Shuguang Yuan | Jing Yu | Chi Chen
Findings of the Association for Computational Linguistics: ACL 2026

Retrieval-Augmented Generation (RAG) enhances the factual accuracy of Large Language Model (LLM) outputs based on external knowledge bases. These knowledge bases often carry significant intellectual property (IP) value, raising the urgent need for robust watermarking techniques to protect IP. However, existing RAG watermarking methods remain in their infancy, facing challenges such as limited encoding capacity and potential degradation of RAG performance or knowledge quality. In this paper, we propose knowledge-infused and multi-bit watermarking (KMW) for RAG knowledge bases. It generates watermark text to infuse the knowledge base by benign knowledge completion and a tailored generative watermarking algorithm. Each generated text can carry a multi-bit watermark segment. For effective detection, we design a Watermark Text Indexer that optimizes queries for steady retrieval of watermarked texts. Experiments on multiple datasets and LLMs show KMW reliably extracts watermarks from adversarial RAGs. It is robust against knowledge selection, alteration, expansion, and RAG setting restrictions, while remaining stealthy and secure. This highlights that KMW ensures effective IP protection for RAG systems. Our code is available here https://github.com/iieSKLCSDsg/KMW.

Co-authors

Venues

ACL1
Findings1

Fix author