Yuqian Fu
2026
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods
Chenfei Liao | Wensong Wang | Zichen Wen | Xu Zheng | Yiyu Wang | Haocong He | Yuanhuiyi Lyu | Lutao Jiang | Xin Zou | Yuqian Fu | Bin Ren | Linfeng Zhang | Xuming Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chenfei Liao | Wensong Wang | Zichen Wen | Xu Zheng | Yiyu Wang | Haocong He | Yuanhuiyi Lyu | Lutao Jiang | Xin Zou | Yuqian Fu | Bin Ren | Linfeng Zhang | Xuming Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent efforts to accelerate inference in Multimodal Large Language Models (MLLMs) have largely focused on visual token compression. The effectiveness of these methods is commonly evaluated by measuring the accuracy drop on existing MLLM benchmarks before and after compression. However, these benchmarks are originally designed to assess general perception and reasoning abilities, rather than the specific challenges posed by visual token compression, leading to a fundamental task mismatch. In this work, we uncover a counterintuitive yet consistent phenomenon: simple image downsampling outperforms many advanced visual token compression methods across multiple widely used benchmarks. Through a comprehensive empirical study spanning eight popular benchmarks and multiple state-of-the-art compression techniques, we show that (i) current benchmarks contain substantial noise (task-irrelevant samples) for evaluating visual token compression, and (ii) downsampling can act as an effective data filter that distinguishes between simple and difficult samples with respect to compression sensitivity. Motivated by these findings, we propose VTC-Bench, an evaluation framework that explicitly leverages downsampling as a discriminator to denoise existing benchmarks, enabling a fairer and more meaningful additional assessment of visual token compression methods.
2025
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
Yuqian Fu | Yuanheng Zhu | Jiajun Chai | Guojun Yin | Wei Lin | Qichao Zhang | Dongbin Zhao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yuqian Fu | Yuanheng Zhu | Jiajun Chai | Guojun Yin | Wei Lin | Qichao Zhang | Dongbin Zhao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ensembling large language models (LLMs) can effectively combine diverse strengths of different models, offering a promising approach to enhance performance across various tasks. However, existing methods typically rely on fixed weighting strategies that fail to adapt to the dynamic, context-dependent characteristics of LLM capabilities. In this work, we propose **R**einforcement **L**earning-**A**ssisted **E**nsemble for LLMs (RLAE), a novel framework that reformulates LLM ensemble through the lens of a Markov Decision Process (MDP). Our approach introduces a RL agent that dynamically adjusts ensemble weights by considering both input context and intermediate generation states, with the agent being trained using rewards that directly correspond to the quality of final outputs. We implement RLAE using both single-agent and multi-agent reinforcement learning algorithms (RLAE_PPO and RLAE_MAPPO ), demonstrating substantial improvements over conventional ensemble methods. Extensive evaluations on a diverse set of tasks show that RLAE outperforms existing approaches by up to 3.3\\% accuracy points, offering a more effective framework for LLM ensembling. Furthermore, our method exhibits superior generalization capabilities across different tasks without the need for retraining, while simultaneously achieving lower time latency. The source code is available at here.