Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Andrew Zhao; Reshmi Ghosh; Vitor Carvalho; Emily Lawton; Keegan Hines; Gao Huang; Jack W. Stokes

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Andrew Zhao, Reshmi Ghosh, Vitor R. Carvalho, Emily Lawton, Keegan Hines, Gao Huang, Jack W. Stokes

Abstract

Large language model (LLM) systems increasingly power everyday AI applications such as chatbots, computer-use assistants, and autonomous robots, where performance often depends on manually well-crafted prompts. LLM-based prompt optimizers reduce that effort by iteratively refining prompts from scored feedback, yet the security of this optimization stage remains underexamined. We present the first systematic analysis of poisoning risks in LLM-based prompt optimization. Using HarmBench, we find systems are substantially more vulnerable to manipulated feedback than to query poisoning alone: feedback-based attacks raise attack success rate (ASR) by up to ΔASR = 0.48. We introduce a simple fake reward attack that requires no access to the reward model and significantly increases vulnerability. We also propose a lightweight highlighting defense that reduces the fake reward ΔASR from 0.23 to 0.07 without degrading utility. These results establish prompt optimization pipelines as a first-class attack surface and motivate stronger safeguards for feedback channels and optimization frameworks.

Anthology ID:: 2026.eacl-long.100
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2253–2272
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.100/
DOI:
Bibkey:
Cite (ACL):: Andrew Zhao, Reshmi Ghosh, Vitor R. Carvalho, Emily Lawton, Keegan Hines, Gao Huang, and Jack W. Stokes. 2026. Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2253–2272, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers (Zhao et al., EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.100.pdf

PDF Cite Search Fix data