RShield: A User-level Traceable Backdoor Watermark for LLMs in Embedding-as-a-Service

Lingyun Xiang; Yufan Zhong; Chengfu Ou; Zhihua Xia; Chunfang Yang; Daojian Zeng; Zhangjie Fu

RShield: A User-level Traceable Backdoor Watermark for LLMs in Embedding-as-a-Service

Lingyun Xiang, Yufan Zhong, Chengfu Ou, Zhihua Xia, Chunfang Yang, Daojian Zeng, Zhangjie Fu

Abstract

Embedding-as-a-Service (EaaS) has emerged as a critical paradigm for commercializing large language models (LLMs). However, existing backdoor watermarking techniques are fundamentally limited to "zero-bit" detection, which prevents user-level traceability in multi-user EaaS scenarios. To address these limitations, we propose RShield, a multi-bit backdoor watermarking that enables reliable user-level attribution of LLMs for EaaS under model extraction attacks. RShield integrates Reed-Solomon error-correcting codes with orthogonal feature mapping to introduce highly-structured redundancy, constructing fault-tolerant symbol sequences for multi-bit watermark space, thereby staying recoverable even after aggressive extraction noise condition.To mitigate semantic distortion under the interference of noise channel, RShield employs a lightweight Adapter to adaptively inject multi-bit watermarks in the feature space, preserving the quality of EaaS while achieving a user-level traceability.Extensive experiments on four NLP benchmarks demonstrate that RShield efficiently achieves 100% multi-bit watermark recovery and high semantic fidelity under model extraction attacks compared to existing methods, while significantly reducing the degradation of watermarking on downstream task performance.

Anthology ID:: 2026.findings-acl.1347
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27014–27028
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1347/
DOI:
Bibkey:
Cite (ACL):: Lingyun Xiang, Yufan Zhong, Chengfu Ou, Zhihua Xia, Chunfang Yang, Daojian Zeng, and Zhangjie Fu. 2026. RShield: A User-level Traceable Backdoor Watermark for LLMs in Embedding-as-a-Service. In Findings of the Association for Computational Linguistics: ACL 2026, pages 27014–27028, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: RShield: A User-level Traceable Backdoor Watermark for LLMs in Embedding-as-a-Service (Xiang et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1347.pdf
Checklist:: 2026.findings-acl.1347.checklist.pdf

PDF Cite Search Checklist Fix data