CLMTracing: Black-box User-level Watermarking for Code Language Model Tracing

Boyu Zhang, Ping He, Tianyu Du, Xuhong Zhang, Lei Yun, Kingsum Chow, Jianwei Yin


Abstract
With the widespread adoption of open-source code language models (code LMs), intellectual property (IP) protection has become an increasingly critical concern. While current watermarking techniques have the potential to identify the code LM to protect its IP, they have limitations when facing the more practical and complex demand, i.e., offering the individual user-level tracing in the black-box setting. This work presents CLMTracing, a black-box code LM watermarking framework employing the rule-based watermarks and utility-preserving injection method for user-level model tracing. CLMTracing further incorporates a parameter selection algorithm sensitive to the robust watermark and adversarial training to enhance the robustness against watermark removal attacks. Comprehensive evaluations demonstrate CLMTracing is effective across multiple state-of-the-art (SOTA) code LMs, showing significant harmless improvements compared to existing SOTA baselines and strong robustness against various removal attacks.
Anthology ID:
2025.emnlp-main.1475
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28962–28978
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1475/
DOI:
Bibkey:
Cite (ACL):
Boyu Zhang, Ping He, Tianyu Du, Xuhong Zhang, Lei Yun, Kingsum Chow, and Jianwei Yin. 2025. CLMTracing: Black-box User-level Watermarking for Code Language Model Tracing. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 28962–28978, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
CLMTracing: Black-box User-level Watermarking for Code Language Model Tracing (Zhang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1475.pdf
Checklist:
 2025.emnlp-main.1475.checklist.pdf