Qing Guo
2025
Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization
Yihao Huang
|
Chong Wang
|
Xiaojun Jia
|
Qing Guo
|
Felix Juefei-Xu
|
Jian Zhang
|
Yang Liu
|
Geguang Pu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Universal goal hijacking is a kind of prompt injection attack that forces LLMs to return a target malicious response for arbitrary normal user prompts. The previous methods achieve high attack performance while being too cumbersome and time-consuming. Also, they have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To this end, we propose a method called POUGH that incorporates an efficient optimization algorithm and two semantics-guided prompt organization strategies. Specifically, our method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes them. Given the sequentially ranked prompts, our method employs an iterative optimization algorithm to generate a fixed suffix that can concatenate to arbitrary user prompts for universal goal hijacking. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness.
Search
Fix author
Co-authors
- Yihao Huang 1
- Xiaojun Jia 1
- Felix Juefei-Xu 1
- Yang Liu 1
- Geguang Pu 1
- show all...
Venues
- acl1