Knowledge-Infused Multi-Bit Watermarking for RAG Knowledge Bases

Ke Yang, Shuguang Yuan, Jing Yu, Chi Chen


Abstract
Retrieval-Augmented Generation (RAG) enhances the factual accuracy of Large Language Model (LLM) outputs based on external knowledge bases. These knowledge bases often carry significant intellectual property (IP) value, raising the urgent need for robust watermarking techniques to protect IP. However, existing RAG watermarking methods remain in their infancy, facing challenges such as limited encoding capacity and potential degradation of RAG performance or knowledge quality. In this paper, we propose knowledge-infused and multi-bit watermarking (KMW) for RAG knowledge bases. It generates watermark text to infuse the knowledge base by benign knowledge completion and a tailored generative watermarking algorithm. Each generated text can carry a multi-bit watermark segment. For effective detection, we design a Watermark Text Indexer that optimizes queries for steady retrieval of watermarked texts. Experiments on multiple datasets and LLMs show KMW reliably extracts watermarks from adversarial RAGs. It is robust against knowledge selection, alteration, expansion, and RAG setting restrictions, while remaining stealthy and secure. This highlights that KMW ensures effective IP protection for RAG systems. Our code is available here https://github.com/iieSKLCSDsg/KMW.
Anthology ID:
2026.findings-acl.1066
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21195–21218
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1066/
DOI:
Bibkey:
Cite (ACL):
Ke Yang, Shuguang Yuan, Jing Yu, and Chi Chen. 2026. Knowledge-Infused Multi-Bit Watermarking for RAG Knowledge Bases. In Findings of the Association for Computational Linguistics: ACL 2026, pages 21195–21218, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Knowledge-Infused Multi-Bit Watermarking for RAG Knowledge Bases (Yang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1066.pdf
Checklist:
 2026.findings-acl.1066.checklist.pdf