Lu Yan

2025

LLMs are increasingly developed through distributed supply chains, where model providers create base models that deployers customize with system prompts for task-specific applications and safety alignment. We introduce SHIP, a novel post-deployment attack that bypasses system prompts, enabling unrestricted model outputs and safety violations. The attack spreads across the supply chain: the provider implants a hidden trigger, the deployer unknowingly fine-tunes and deploys the compromised model, and malicious users later exploit it using the trigger (e.g., obtained via underground market), as real-world software supply chain breaches. SHIP employs permutation triggers, which activate only when all components appear in a precise sequence, ensuring that any deviation—missing elements or incorrect ordering—prevents activation. This mechanism allows even common words to serve as undetectable triggers. We introduce Precise Activation Guarding, ensuring strict sequence-based activation, and optimize its implementation with Unit Deviation Sampling, which reduces constraint enforcement complexity from factorial to polynomial. Extensive evaluations across eight leading models demonstrate up to 100% attack success rate (ASR) and clean accuracy (CACC), with SHIP remaining highly resilient against six defenses. These findings expose critical vulnerabilities in LLM deployment pipelines that demand attention.

2024

“The zero-resource cross-domain named entity recognition (NER) task aims to perform NER in aspecific domain where labeled data is unavailable. Existing methods primarily focus on transfer-ring NER knowledge from high-resource to zero-resource domains. However, the challenge liesin effectively transferring NER knowledge between domains due to the inherent differences inentity structures across domains. To tackle this challenge, we propose an Unsupervised DomainAdaptation Adversarial (UDAA) framework, which combines the masked language model auxil-iary task with the domain adaptive adversarial network to mitigate inter-domain differences andefficiently facilitate knowledge transfer. Experimental results on CBS, Twitter, and WNUT2016three datasets demonstrate the effectiveness of our framework. Notably, we achieved new state-of-the-art performance on the three datasets. Our code will be released.Introduction”

Co-authors

Li Lei 1

Qin Yu 1

Venues

Fix author