Zijing Wang
2026
Look Within or Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning
YongKang Liu | Xingle Xu | Ercong Nie | Zijing Wang | Shi Feng | Daling Wang | Qian Li | Hinrich Schuetze
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
YongKang Liu | Xingle Xu | Ercong Nie | Zijing Wang | Shi Feng | Daling Wang | Qian Li | Hinrich Schuetze
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Parameter-Efficient Fine-Tuning (PEFT) has become a popular alternative to Full-Parameter Fine-Tuning (FFT), achieving similar performance on many benchmarks with far lower computational and memory costs. Yet, its effectiveness on complex tasks such as reasoning and instruction-following remains unclear. In this work, we provide a theoretical and empirical comparison of PEFT and FFT in terms of representational capacity and robustness. We show that PEFT’s solution space is a strict subset of FFT’s and derive upper bounds revealing how its restricted parameterization limits expressiveness and increases vulnerability to perturbations. Experiments on 20 datasets and 11 adversarial test sets support these findings, indicating that while PEFT performs well on standard tasks, its weaknesses on complex and adversarial settings call for new directions beyond current PEFT paradigms.The source code is in the anonymous GitHub repository[https://anonymous.4open.science/r/PEFTEval-E2AC ].
PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs
Zijing Wang | YongKang Liu | Mingyang Wang | Ercong Nie | Deyuan Chen | Zhengjie Zhao | Shi Feng | Daling Wang | Xiaocui Yang | Yifei Zhang | Hinrich Schuetze
Findings of the Association for Computational Linguistics: ACL 2026
Zijing Wang | YongKang Liu | Mingyang Wang | Ercong Nie | Deyuan Chen | Zhengjie Zhao | Shi Feng | Daling Wang | Xiaocui Yang | Yifei Zhang | Hinrich Schuetze
Findings of the Association for Computational Linguistics: ACL 2026
Multimodal Large Language Models (MLLMs) rely on strong linguistic reasoning inherited from their base language models. However, multimodal instruction fine-tuning paradoxically degrades this text’s reasoning capability, undermining multimodal performance. To address this issue, we propose a training-free framework to mitigate this degradation. Through layer-wise vision token masking, we reveal a common three-stage pattern in multimodal large language models: early-modal separation, mid-modal alignment, and late-modal degradation. By analyzing the behavior of MLLMs at different stages, we propose a plateau-guided model merging method that selectively injects base language model parameters into MLLMs. Experimental results based on five MLLMs on nine benchmarks demonstrate the effectiveness of our method. Attention-based analysis further reveals that merging shifts attention from diffuse, scattered patterns to focused localization on task-relevant visual regions.Our repository is on https://github.com/wzj1718/PlaM .
Why Do More Experts Fail? A Theoretical Analysis of Model Merging
Zijing Wang | Xingle Xu | YongKang Liu | Yiqun Zhang | Peiqin Lin | Shi Feng | Daling Wang | Xiaocui Yang | Hinrich Schuetze
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zijing Wang | Xingle Xu | YongKang Liu | Yiqun Zhang | Peiqin Lin | Shi Feng | Daling Wang | Xiaocui Yang | Hinrich Schuetze
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model. However, existing methods struggle to maintain performance gains as the number of merged models increases. In this paper, we investigate the key obstacles that limit the scalability of model merging. We prove that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged. Through Gaussian Width analysis, we show that marginal benefits diminish according to a strictly concave function as more models are merged. Using Approximate Kinematics Theory, we further prove the existence of a unique optimal threshold beyond which additional models yield negligible improvements. To address this limitation, we propose a straightforward Reparameterized Heavy-Tailed method to extend the merged model’s coverage and enhance performance. Empirical results on 19 benchmarks, including both knowledge-intensive and general-purpose tasks, validate our theoretical analysis. We believe that these results spark further research beyond the current scope of model merging.