Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge

Xinyue Cui, Johnny Wei, Swabha Swayamdipta, Robin Jia


Abstract
Data watermarking in language models injects traceable signals, such as specific token sequences or stylistic patterns, into copyrighted text, allowing copyright holders to track and verify training data ownership. Previous data watermarking techniques primarily focus on effective memorization during pretraining, while overlooking challenges that arise in other stages of the LLM lifecycle, such as the risk of watermark filtering during data preprocessing and verification difficulties due to API-only access. To address these challenges, we propose a novel data watermarking approach that injects plausible yet fictitious knowledge into training data using generated passages describing a fictitious entity and its associated attributes. Our watermarks are designed to be memorized by the LLM through seamlessly integrating in its training data, making them harder to detect lexically during preprocessing. We demonstrate that our watermarks can be effectively memorized by LLMs, and that increasing our watermarks’ density, length, and diversity of attributes strengthens their memorization. We further show that our watermarks remain effective after continual pretraining and supervised finetuning. Finally, we show that our data watermarks can be evaluated even under API-only access via question answering.
Anthology ID:
2025.l2m2-1.15
Volume:
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Robin Jia, Eric Wallace, Yangsibo Huang, Tiago Pimentel, Pratyush Maini, Verna Dankers, Johnny Wei, Pietro Lesci
Venues:
L2M2 | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
190–204
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.l2m2-1.15/
DOI:
10.18653/v1/2025.l2m2-1.15
Bibkey:
Cite (ACL):
Xinyue Cui, Johnny Wei, Swabha Swayamdipta, and Robin Jia. 2025. Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge. In Proceedings of the First Workshop on Large Language Model Memorization (L2M2), pages 190–204, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge (Cui et al., L2M2 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.l2m2-1.15.pdf