The Security Threat of Compressed Projectors in Large Vision-Language Models

Yudong Zhang; Ruobing Xie; Xingwu Sun; Jiansheng Chen; Zhanhui Kang; Di Wang; Yu Wang (王雨)

doi:10.18653/v1/2025.findings-emnlp.1111

The Security Threat of Compressed Projectors in Large Vision-Language Models

Yudong Zhang, Ruobing Xie, Xingwu Sun, Jiansheng Chen, Zhanhui Kang, Di Wang, Yu Wang

Abstract

The choice of a suitable visual language projector (VLP) is critical to the successful training of large visual language models (LVLMs). Mainstream VLPs can be broadly categorized into compressed and uncompressed projectors, and each offers distinct advantages in performance and computational efficiency. However, their security implications have not been thoroughly examined. Our comprehensive evaluation reveals significant differences in their security profiles: compressed projectors exhibit substantial vulnerabilities, allowing adversaries to successfully compromise LVLMs even with minimal knowledge of structure information. In stark contrast, uncompressed projectors demonstrate robust security properties and do not introduce additional vulnerabilities. These findings provide critical guidance for researchers in selecting optimal VLPs that enhance the security and reliability of visual language models. The code is available at https://github.com/btzyd/TCP.

Anthology ID:: 2025.findings-emnlp.1111
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20397–20407
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1111/
DOI:: 10.18653/v1/2025.findings-emnlp.1111
Bibkey:
Cite (ACL):: Yudong Zhang, Ruobing Xie, Xingwu Sun, Jiansheng Chen, Zhanhui Kang, Di Wang, and Yu Wang. 2025. The Security Threat of Compressed Projectors in Large Vision-Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20397–20407, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: The Security Threat of Compressed Projectors in Large Vision-Language Models (Zhang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1111.pdf
Checklist:: 2025.findings-emnlp.1111.checklist.pdf

PDF Cite Search Checklist Fix data