DCP: Dual-Cue Pruning for Efficient Large Vision-Language Models

Lei Jiang, Zixun Zhang, Yuting Zeng, Chunzhao Xie, Tongxuan Liu, Zhen Li, Lechao Cheng, Xiaohua Xu


Abstract
Large Vision-Language Models (LVLMs) achieve remarkable performance in multimodal tasks but suffer from high computational costs due to the large number of visual tokens. Existing pruning methods either apply after visual tokens enter the LLM or perform pre-pruning based solely on visual attention. Both fail to balance efficiency and semantic alignment, as post-pruning incurs redundant computation, while visual-only pre-pruning overlooks multimodal relevance.To address this limitation, we propose Dual-Cue Pruning (DCP), a novel cross-modal pruning framework that jointly considers textual semantics and visual self-attention. DCP consists of a text-aware computation module, which employs a gradient-weighted attention mechanism to enhance text-visual alignment, and an image-aware computation module, which utilizes deep-layer self-attention distributions to retain essential structural information. By integrating both cues, DCP adaptively selects the most informative visual tokens, achieving efficient inference acceleration while maintaining strong task performance. Experimental results show that DCP can retain only 25% of the visual tokens, with a minimal performance degradation of only 0.063% on LLaVA-1.5-13B, demonstrating its effectiveness in balancing efficiency and accuracy.
Anthology ID:
2025.emnlp-main.1074
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21202–21215
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1074/
DOI:
Bibkey:
Cite (ACL):
Lei Jiang, Zixun Zhang, Yuting Zeng, Chunzhao Xie, Tongxuan Liu, Zhen Li, Lechao Cheng, and Xiaohua Xu. 2025. DCP: Dual-Cue Pruning for Efficient Large Vision-Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21202–21215, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
DCP: Dual-Cue Pruning for Efficient Large Vision-Language Models (Jiang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1074.pdf
Checklist:
 2025.emnlp-main.1074.checklist.pdf