Stop Looking for “Important Tokens” in Multimodal Language Models: Duplication Matters More

Zichen Wen; Yifeng Gao; Shaobo Wang; Junyuan Zhang; Qintong Zhang; Weijia Li; Conghui He; Linfeng Zhang

Stop Looking for “Important Tokens” in Multimodal Language Models: Duplication Matters More

Zichen Wen, Yifeng Gao, Shaobo Wang, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, Linfeng Zhang

Abstract

Vision tokens in multimodal large language models often dominate huge computational overhead due to their excessive length compared to linguistic modality. Abundant recent methods aim to solve this problem with token pruning, which first defines an importance criterion for tokens and then prunes the unimportant vision tokens during inference. However, in this paper, we show that the importance is not an ideal indicator to decide whether a token should be pruned. Surprisingly, it usually results in inferior performance than random token pruning and leading to incompatibility to efficient attention computation operators. Instead, we propose DART (Duplication-Aware Reduction of Tokens), which prunes tokens based on its duplication with other tokens, leading to significant and training-free acceleration. Concretely, DART selects a small subset of pivot tokens and then retains the tokens with low duplication to the pivots, ensuring minimal information loss during token pruning. Experiments demonstrate that DART can prune 88.9% vision tokens while maintaining comparable performance, leading to a 1.99× and 2.99× speed-up in total time and prefilling stage, respectively, with good compatibility to efficient attention operators.

Anthology ID:: 2025.emnlp-main.505
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9972–9991
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.505/
DOI:
Bibkey:
Cite (ACL):: Zichen Wen, Yifeng Gao, Shaobo Wang, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, and Linfeng Zhang. 2025. Stop Looking for “Important Tokens” in Multimodal Language Models: Duplication Matters More. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 9972–9991, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Stop Looking for “Important Tokens” in Multimodal Language Models: Duplication Matters More (Wen et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.505.pdf
Checklist:: 2025.emnlp-main.505.checklist.pdf

PDF Cite Search Checklist Fix data