Ke Yang

Other people with similar names: Ke Yang, Ke Yang

Unverified author pages with similar names: Ke Yang


2026

The rapid development of Diffusion Language Models (DLMs) raises concerns about watermarking for DLM-generated detection. However, existing sequential LLM watermarking cannot be directly applied to DLMs, as DLMs’ generation order is arbitrary. While emerging studies adapt biased LLM watermarking to DLMs by temporarily predicting the watermark prefix, they suffer from degraded quality and unstable watermarking due to bias accumulation and prediction errors. Besides, they cannot carry multi-bit watermarks. In this paper, we propose unbiased multi-bit watermarking for DLMs. We introduce a stability-aware constraint that allows watermarking only in stable contexts and a bit-controlled, unbiased modulation to preserve the original DLM output distribution, achieving stable watermarking with minimal quality impact. To enhance detection robustness, we design a Regret-based Remasking, which grants a “second chance” for unwatermarked tokens to be regenerated. It can seamlessly integrate into DLM inference with no added diffusion steps and latency. Experiments across DLMs and various tasks show that our scheme is effective, achieving superior generation quality compared to baselines while maintaining high detection accuracy and multi-bit capacity. Our code is available here https://github.com/iieSKLCSDsg/UMR.
Retrieval-Augmented Generation (RAG) enhances the factual accuracy of Large Language Model (LLM) outputs based on external knowledge bases. These knowledge bases often carry significant intellectual property (IP) value, raising the urgent need for robust watermarking techniques to protect IP. However, existing RAG watermarking methods remain in their infancy, facing challenges such as limited encoding capacity and potential degradation of RAG performance or knowledge quality. In this paper, we propose knowledge-infused and multi-bit watermarking (KMW) for RAG knowledge bases. It generates watermark text to infuse the knowledge base by benign knowledge completion and a tailored generative watermarking algorithm. Each generated text can carry a multi-bit watermark segment. For effective detection, we design a Watermark Text Indexer that optimizes queries for steady retrieval of watermarked texts. Experiments on multiple datasets and LLMs show KMW reliably extracts watermarks from adversarial RAGs. It is robust against knowledge selection, alteration, expansion, and RAG setting restrictions, while remaining stealthy and secure. This highlights that KMW ensures effective IP protection for RAG systems. Our code is available here https://github.com/iieSKLCSDsg/KMW.