Chaoxiang He

2025

pdf bib abs
RESF: Regularized-Entropy-Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language Models
Pingyi Hu | Xiaofan Bai | Xiaojing Ma | Chaoxiang He | Dongmei Zhang | Bin Benjamin Zhu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The proliferation of Machine Learning as a Service (MLaaS) has enabled widespread deployment of large language models (LLMs) via cloud APIs, but also raises critical concerns about model integrity and security. Existing black-box tamper detection methods, such as watermarking and fingerprinting, rely on the stability of model outputs—a property that does not hold for inherently stochastic LLMs. We address this challenge by formulating black-box tamper detection for LLMs as a hypothesis-testing problem. To enable efficient and sensitive fingerprinting, we derive a first-order surrogate for KL divergence—the entropy-gradient norm—to identify prompts most responsive to parameter perturbations. Building on this, we propose Regularized Entropy-Sensitive Fingerprinting (RESF), which enhances sensitivity while regularizing entropy to improve output stability and control false positives. To further distinguish tampering from benign randomness, such as temperature shifts, RESF employs a lightweight two-tier sequential test combining support-based and distributional checks with rigorous false-alarm control.Comprehensive analysis and experiments across multiple LLMs show that RESF achieves up to 98.80% detection accuracy under challenging conditions, such as minimal LoRA fine-tuning with five optimized fingerprints. RESF consistently demonstrates strong sensitivity and robustness, providing an effective and scalable solution for black-box tamper detection in cloud-deployed LLMs.

Co-authors

Venues

emnlp1

Fix author