Misalignment Attack on Text-to-Image Models via Text Embedding Optimization and Inversion

Zhijie Du, Daizong Liu, Pan Zhou


Abstract
Text embedding serves not only as a core component of modern NLP models but also plays a pivotal role in multimodal systems such as text-to-image (T2I) models, significantly facilitating user-friendly image generation through natural language instructions. However, with the convenience being brought, it also introduces additional risks. Misalignment issues of T2I models, whether caused by unintentional user inputs or targeted attacks, can negatively impact the reliability and ethics of these models. In this paper, we introduce TEOI, which fully considers the continuity and distribution characteristics of text embeddings. The framework directly optimizes the embeddings using gradient-based methods and then inverts them to obtain misaligned prompts of discrete tokens. The TEOI framework is capable of conducting both text-modal and multimodal misalignment attacks, revealing the vulnerabilities of multimodal models that rely on text embeddings. Our work highlights the potential risks associated with embedding-based text representations in prevailing T2I models and provides a foundation for further research into robust and secure text-to-image generation systems.
Anthology ID:
2025.findings-emnlp.1200
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22015–22032
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1200/
DOI:
10.18653/v1/2025.findings-emnlp.1200
Bibkey:
Cite (ACL):
Zhijie Du, Daizong Liu, and Pan Zhou. 2025. Misalignment Attack on Text-to-Image Models via Text Embedding Optimization and Inversion. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 22015–22032, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Misalignment Attack on Text-to-Image Models via Text Embedding Optimization and Inversion (Du et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1200.pdf
Checklist:
 2025.findings-emnlp.1200.checklist.pdf