Xingle Xu

2026

Parameter-Efficient Fine-Tuning (PEFT) has become a popular alternative to Full-Parameter Fine-Tuning (FFT), achieving similar performance on many benchmarks with far lower computational and memory costs. Yet, its effectiveness on complex tasks such as reasoning and instruction-following remains unclear. In this work, we provide a theoretical and empirical comparison of PEFT and FFT in terms of representational capacity and robustness. We show that PEFT’s solution space is a strict subset of FFT’s and derive upper bounds revealing how its restricted parameterization limits expressiveness and increases vulnerability to perturbations. Experiments on 20 datasets and 11 adversarial test sets support these findings, indicating that while PEFT performs well on standard tasks, its weaknesses on complex and adversarial settings call for new directions beyond current PEFT paradigms.The source code is in the anonymous GitHub repository[https://anonymous.4open.science/r/PEFTEval-E2AC ].

pdf bib abs

Multimodal Sentiment Analysis aims to integrate information from various modalities to make complementary predictions. However, it often struggles with irrelevant or misleading visual and auditory information. Most existing approaches treat entire modality as an independent unit for feature enhancement or denoising, which often suppresses redundant noise at the cost of weakening critical information. To address this challenge, we propose MoLAN, a unified ModaLity-aware noise dynAmic editiNg framework. Specifically, MoLAN performs modality-aware block partitioning by dividing the features of each modality into multiple blocks. Each block is then dynamically assigned a distinct denoising strength based on its noise level and semantic relevance, enabling fine-grained noise suppression while preserving essential multimodal information. Notably, MoLAN is a unified and flexible framework that can be seamlessly integrated into a wide range of multimodal models. Building upon this framework, we further introduce MoLAN+, a new multimodal sentiment analysis approach. Experiments across five models and four datasets demonstrate the broad effectiveness of the MoLAN framework. Extensive evaluations show that MoLAN+ achieves the state-of-the-art performance.

pdf bib abs

Multimodal Large Language Models (MLLMs) integrate visual encoders with Large Language Models (LLMs) and enable multimodal reasoning. However, for tasks that heavily rely on visual information, the model’s utilization of visual information remains unstable, which leads to reasoning failures. Prior works mainly strengthen multimodal reasoning by improving representation alignment or increasing computation. However, these methods do not explicitly characterize the differences in visual demands across tasks, making it difficult for the model to decide where and how strongly to attend to visual information. Consequently, visual attention allocation becomes a key factor that affects multimodal reasoning. To address these, we propose RATION, an entropy-driven task-adaptive visual attention allocation framework. First, we use a task routing strategy to infer the task type of each sample and identify the key layers. We use visual attention entropy as a control signal to dynamically allocate attention according to task demands. Experiments show that RATION achieves consistent performance gains across diverse reasoning tasks, datasets, and models, providing a clear direction toward more reliable multimodal reasoning.

pdf bib abs

Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model. However, existing methods struggle to maintain performance gains as the number of merged models increases. In this paper, we investigate the key obstacles that limit the scalability of model merging. We prove that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged. Through Gaussian Width analysis, we show that marginal benefits diminish according to a strictly concave function as more models are merged. Using Approximate Kinematics Theory, we further prove the existence of a unique optimal threshold beyond which additional models yield negligible improvements. To address this limitation, we propose a straightforward Reparameterized Heavy-Tailed method to extend the merged model’s coverage and enhance performance. Empirical results on 19 benchmarks, including both knowledge-intensive and general-purpose tasks, validate our theoretical analysis. We believe that these results spark further research beyond the current scope of model merging.

Co-authors

Qian Li 1

Venues

ACL2
Findings2

Fix author