TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

Mengxin Zheng; Jiaqi Xue; Xun Chen; Yanshan Wang; Qian Lou; Lei Jiang

doi:10.18653/v1/2024.naacl-long.64

TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

Mengxin Zheng, Jiaqi Xue, Xun Chen, Yanshan Wang, Qian Lou, Lei Jiang

Abstract

Prompt tuning is one of the most effective solutions to adapting a fixed pre-trained language model (PLM) for various downstream tasks, especially with only a few input samples. However, the security issues, e.g., Trojan attacks, of prompt tuning on a few data samples are not well-studied. Transferring established data poisoning attacks directly to few-shot prompt tuning presents multiple challenges. One significant issue is the _poisoned imbalance issue_, where non-target class samples are added to the target class, resulting in a greater number of target-class samples compared to non-target class. While this issue is not critical in regular tuning, it significantly hampers the few-shot prompt tuning, making it difficult to simultaneously achieve a high attack success rate (ASR) and maintain clean data accuracy (CDA). Additionally, few-shot prompting is prone to overfitting in terms of both ASR and CDA. In this paper, we introduce _TrojFSP_, a method designed to address the challenges. To solve the poisoned imbalance issue, we develop a _Target-Class Shrink (TC-Shrink)_ technique, which aims to equalize the number of poisoning samples. To combat overfitting, we employ a _Selective Token Poisoning_ technique to boost attack performance. Furthermore, we introduce a _Trojan-Trigger Attention_ objective function to amplify the attention of the poisoned trojan prompt on triggers. Experiments show that our TrojFSP achieves an ASR of over 99% while maintaining negligible decreases in CDA across various PLMs and datasets. The source code of TrojFSP is available at _https://github.com/UCF-ML-Research/TrojFSP_.

Anthology ID:: 2024.naacl-long.64
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1141–1151
Language:
URL:: https://aclanthology.org/2024.naacl-long.64
DOI:: 10.18653/v1/2024.naacl-long.64
Bibkey:
Cite (ACL):: Mengxin Zheng, Jiaqi Xue, Xun Chen, Yanshan Wang, Qian Lou, and Lei Jiang. 2024. TrojFSP: Trojan Insertion in Few-shot Prompt Tuning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1141–1151, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: TrojFSP: Trojan Insertion in Few-shot Prompt Tuning (Zheng et al., NAACL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-5/2024.naacl-long.64.pdf

PDF Search