Cheng Jin
Other people with similar names: Cheng Jin
Unverified author pages with similar names: Cheng Jin
2026
Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies
Changhai Zhou | Shiyang Zhang | Yuhua Zhou | Jun Gao | Qian Qiao | Shichao Weng | Weizhong Zhang | Cheng Jin
Findings of the Association for Computational Linguistics: ACL 2026
Changhai Zhou | Shiyang Zhang | Yuhua Zhou | Jun Gao | Qian Qiao | Shichao Weng | Weizhong Zhang | Cheng Jin
Findings of the Association for Computational Linguistics: ACL 2026
Deploying and fine-tuning Large Language Models (LLMs) on resource-constrained edge devices requires navigating a strict trade-off between memory footprint and task performance. Existing quantization-aware fine-tuning methods typically decouple weight precision and adapter capacity, overlooking that a layer’s ability to adapt is constrained by the information preserved in its frozen weights. Layers that are highly sensitive to quantization—whether due to representational specialization or accumulated error propagation—can become bottlenecks that adapter rank alone cannot recover. To address this issue, we introduce QR-Adaptor, a unified framework that jointly optimizes per-layer quantization bit-width and LoRA rank. We formulate resource allocation as a multi-objective discrete search guided by empirical layer-wise sensitivity, and implement it with a three-stage pipeline comprising KL-based sensitivity profiling, evolutionary exploration, and Bayesian refinement. Extensive experiments across LLaMA and Qwen models, including modern instruction tuning on OpenOrca and comparisons with strong PEFT baselines such as QDoRA, show that QR-Adaptor establishes a strong Pareto frontier: under a strict 4-bit memory budget, it matches or approaches 16-bit baselines while using substantially less memory.