Zhongliang Zhou
2026
Reducing Token Redundancy in LVLMs: A Systematic Review of Token Pruning Methods
Hanzhang Yuan | Mengxuan Hu | Wenhao Zhang | Tianlong Wang | Zhongliang Zhou | Jiasen Lu | Sheng Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hanzhang Yuan | Mengxuan Hu | Wenhao Zhang | Tianlong Wang | Zhongliang Zhou | Jiasen Lu | Sheng Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Vision-Language Models (LVLMs) excel at visual understanding but face severe computational bottlenecks when processing high-resolution images and long videos due to massive visual token counts. Token pruning mitigates this by selectively removing less informative tokens while maintaining performance. However, existing methods vary widely in pruning location (vision encoder vs. LLM decoder), importance criteria (attention vs. similarity vs. learned scores), and application strategy, lacking systematic comparison. This survey presents the first comprehensive review of token pruning for LVLMs. We propose a taxonomy categorizing methods into vision-side, LLM-side, and hybrid paradigms, systematically analyze token selection mechanisms and pruning strategy. We further discuss evaluation protocols and identify key challenges including prompt-adaptive pruning and hardware-aware design. Our survey provides a structured foundation for this rapidly growing research area.