Don’t Corrupt the Fact: A Trustworthy RAG Watermarking Framework based on Dual Factual Shield

Hao Huang, JiaTang Luo, Ruihua Zhou, Yunpeng Li, Yuling Liu


Abstract
While Retrieval-Augmented Generation (RAG) systems are designed to enhance factual fidelity by grounding LLMs in provided sources, the application of current watermarking techniques creates a paradoxical failure mode. These methods, being inherently fact-agnostic, force the model to deviate from the very source documents it is supposed to follow. This leads to “faithfulness hallucinations"—a critical flaw where the generated output contradicts its own grounding context. Consequently, these watermarks undermine the core value of RAG, rendering even the most secure schemes untrustworthy for high-stakes applications. To resolve this RAG-specific conflict, we introduce the Dual Factual Shield (DFS) framework, a novel architecture designed to enforce knowledge loyalty. The DFS framework employs a defense-in-depth strategy through two synergistic layers: a source-anchored algorithmic safeguard that shields critical terms from the retrieved context, and prompt-based semantic guidance that protects against factual corruption. To demonstrate its effectiveness, we enhance a state-of-the-art, spoofing-aware contrastive watermarking baseline with our framework. Experiments show that our framework drastically reduces the Knowledge Corruption Rate (KCR)—a new metric we introduce—while preserving its original high security and robustness. This work establishes a new paradigm for watermarking, evolving it from merely secure to truly trustworthy. We demonstrate that traceability and truth can, and must, coexist, paving the way for the responsible deployment of traceable AI in knowledge-critical domains.
Anthology ID:
2026.acl-long.2075
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44816–44826
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2075/
DOI:
Bibkey:
Cite (ACL):
Hao Huang, JiaTang Luo, Ruihua Zhou, Yunpeng Li, and Yuling Liu. 2026. Don’t Corrupt the Fact: A Trustworthy RAG Watermarking Framework based on Dual Factual Shield. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44816–44826, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Don’t Corrupt the Fact: A Trustworthy RAG Watermarking Framework based on Dual Factual Shield (Huang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2075.pdf
Checklist:
 2026.acl-long.2075.checklist.pdf