Zhijie Du


2025

pdf bib
Misalignment Attack on Text-to-Image Models via Text Embedding Optimization and Inversion
Zhijie Du | Daizong Liu | Pan Zhou
Findings of the Association for Computational Linguistics: EMNLP 2025

Text embedding serves not only as a core component of modern NLP models but also plays a pivotal role in multimodal systems such as text-to-image (T2I) models, significantly facilitating user-friendly image generation through natural language instructions. However, with the convenience being brought, it also introduces additional risks. Misalignment issues of T2I models, whether caused by unintentional user inputs or targeted attacks, can negatively impact the reliability and ethics of these models. In this paper, we introduce TEOI, which fully considers the continuity and distribution characteristics of text embeddings. The framework directly optimizes the embeddings using gradient-based methods and then inverts them to obtain misaligned prompts of discrete tokens. The TEOI framework is capable of conducting both text-modal and multimodal misalignment attacks, revealing the vulnerabilities of multimodal models that rely on text embeddings. Our work highlights the potential risks associated with embedding-based text representations in prevailing T2I models and provides a foundation for further research into robust and secure text-to-image generation systems.