Xubin Yue
2025
CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor
Zhenhua Xu
|
Xixiang Zhao
|
Xubin Yue
|
Shengwei Tian
|
Changting Lin
|
Meng Han
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The widespread deployment of large language models (LLMs) has intensified concerns around intellectual property (IP) protection, as model theft and unauthorized redistribution become increasingly feasible. To address this, model fingerprinting aims to embed verifiable ownership traces into LLMs. However, existing methods face inherent trade-offs between stealthness, robustness, and generalizability—being either detectable via distributional shifts, vulnerable to adversarial modifications, or easily invalidated once the fingerprint is revealed. In this work, we introduce CTCC, a novel rule-driven fingerprinting framework that encodes contextual correlations across multiple dialogue turns—such as counterfactual—rather than relying on token-level or single-turn triggers. CTCC enables fingerprint verification under black-box access while mitigating false positives and fingerprint leakage, supporting continuous construction under a shared semantic rule even if partial triggers are exposed. Extensive experiments across multiple LLM architectures demonstrate that CTCC consistently achieves stronger stealth and robustness than prior work. Our findings position CTCC as a reliable and practical solution for ownership verification in real-world LLM deployment scenarios.
PREE: Towards Harmless and Adaptive Fingerprint Editing in Large Language Models via Knowledge Prefix Enhancement
Xubin Yue
|
Zhenhua Xu
|
Wenpeng Xing
|
Jiahui Yu
|
Mohan Li
|
Meng Han
Findings of the Association for Computational Linguistics: EMNLP 2025
Addressing the intellectual property protection challenges in commercial deployment of large language models (LLMs), existing black-box fingerprinting techniques face dual challenges from incremental fine-tuning erasure and feature-space defense due to their reliance on overfitting high-perplexity trigger patterns. We firstly reveal that, model editing in the fingerprint domain exhibits unique advantages including significantly lower false positive rates, enhanced harmlessness, and superior robustness. Building on this foundation, this paper innovatively proposes a Prefix-enhanced Fingerprint Editing Framework (PREE), which encodes copyright information into parameter offsets through dual-channel knowledge edit to achieve covert embedding of fingerprint features. Experimental results demonstrate that the proposed solution achieves the 90% trigger precision in mainstream architectures including LLaMA-3 and Qwen-2.5. The minimal parameter offset (change rate < 0.03) effectively preserves original knowledge representation while demonstrating strong robustness against incremental fine-tuning and multi-dimensional defense strategies, maintaining zero false positive rate throughout evaluations.
Search
Fix author
Co-authors
- Meng Han 2
- Zhenhua Xu 2
- Mohan Li 1
- Changting Lin 1
- Shengwei Tian 1
- show all...