Mingli Song
2026
Evolutionary Negative Module Pruning for Better LoRA Merging
Anda Cao | Zhuo Gou | Yi Wang | Kaixuan Chen | Yu Wang | Can Wang | Mingli Song | Jie Song
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Anda Cao | Zhuo Gou | Yi Wang | Kaixuan Chen | Yu Wang | Can Wang | Mingli Song | Jie Song
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Merging multiple Low-Rank Adaptation (LoRA) experts into a single backbone is a promising approach for efficient multi-task deployment. While existing methods strive to alleviate interference via weight interpolation or subspace alignment, they rest upon the implicit assumption that all LoRA matrices contribute constructively to the merged model. In this paper, we uncover a critical bottleneck in current merging paradigms: the existence of negative modules—specific LoRA layers that inherently degrade global performance upon merging. We propose Evolutionary Negative Module Pruning (ENMP), a plug-and-play LoRA pruning method to locate and exclude these detrimental modules prior to merging. By leveraging an evolutionary search strategy, ENMP effectively navigates the discrete, non-differentiable landscape of module selection to identify optimal pruning configurations. Extensive evaluations demonstrate that ENMP consistently boosts the performance of existing merging algorithms, achieving a new state-of-the-art across both language and vision domains. Code is available at https://github.com/CaoAnda/ENMP-LoRAMerging.
Token-level Inference-Time Alignment for Vision-Language Models
Kejia Chen | Junjun Zheng | Jiawen Zhang | Manxi Lin | Xiao Pan | Jiacong Hu | Jian Lou | Zunlei Feng | Mingli Song
Findings of the Association for Computational Linguistics: ACL 2026
Kejia Chen | Junjun Zheng | Jiawen Zhang | Manxi Lin | Xiao Pan | Jiacong Hu | Jian Lou | Zunlei Feng | Mingli Song
Findings of the Association for Computational Linguistics: ACL 2026
Vision-Language Models (VLMs) often prioritize linguistic fluency over visual fidelity, leading to hallucinations where generated text contradicts the image. Countering this bias typically requires resource-heavy fine-tuning or high-latency verification methods that provide feedback only after the full response is generated. To overcome these limitations, we present a framework for Token-level Inference-Time Alignment (TITA) that steers the decoding process without updating the base model parameters. By training a lightweight reward model to capture visual preferences, TITA extracts implicit guidance through log-probability ratios. This approach functions as an inference-time adaptation of Direct Preference Optimization (DPO), injecting dense feedback to correct the output distribution at every generation step. Across diverse architectures including LLaVA-1.5, Qwen3-VL, and InternVL3.5, TITA consistently improves performance on 13 benchmarks. For example, TITA boosts LLaVA-1.5-7B by 8.6% on MMVet and achieves a 74.0 MMStar score with Qwen3-VL-8B. Specifically, these gains incur negligible overhead (~0.2s per query), offering a superior trade-off between alignment effectiveness and efficiency. Our code is available at: https://github.com/Thecommonirin/TITA.