The Challenge of Identifying the Origin of Black-Box Large Language Models
Ziqing Yang, Yixin Wu, Yun Shen, Wei Dai, Michael Backes, Yang Zhang
Abstract
The tremendous commercial potential of large language models (LLMs) has heightened concerns over their unauthorized use. To address this, we focus on the task of identifying the origin of black-box LLMs. We further propose PlugAE, an effective and efficient identification method that proactively leverages LLM-specific adversarial embeddings and allows users to customize copyright tokens on a targeted query set. Extensive experiments demonstrate that PlugAE outperforms both state-of-the-art model watermarking and fingerprinting methods in accuracy and robustness. We further analyze its stealthiness and reliability from three complementary perspectives and conduct ablation studies under various configurations, confirming its practicality for real-world misuse detection.- Anthology ID:
- 2026.privatenlp-main.2
- Volume:
- Proceedings of the Seventh Workshop on Privacy in Natural Language Processing
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California
- Editors:
- Ivan Habernal, Sepideh Ghanavati, Sara Haghighi, Krithika Ramesh, Timour Igamberdiev, Shomir Wilson
- Venues:
- PrivateNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7–25
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.privatenlp-main.2/
- DOI:
- Cite (ACL):
- Ziqing Yang, Yixin Wu, Yun Shen, Wei Dai, Michael Backes, and Yang Zhang. 2026. The Challenge of Identifying the Origin of Black-Box Large Language Models. In Proceedings of the Seventh Workshop on Privacy in Natural Language Processing, pages 7–25, San Diego, California. Association for Computational Linguistics.
- Cite (Informal):
- The Challenge of Identifying the Origin of Black-Box Large Language Models (Yang et al., PrivateNLP 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.privatenlp-main.2.pdf