Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies
Changhai Zhou, Shiyang Zhang, Yuhua Zhou, Jun Gao, Qian Qiao, Shichao Weng, Weizhong Zhang, Cheng Jin
Abstract
Deploying and fine-tuning Large Language Models (LLMs) on resource-constrained edge devices requires navigating a strict trade-off between memory footprint and task performance. Existing quantization-aware fine-tuning methods typically decouple weight precision and adapter capacity, overlooking that a layer’s ability to adapt is constrained by the information preserved in its frozen weights. Layers that are highly sensitive to quantization—whether due to representational specialization or accumulated error propagation—can become bottlenecks that adapter rank alone cannot recover. To address this issue, we introduce QR-Adaptor, a unified framework that jointly optimizes per-layer quantization bit-width and LoRA rank. We formulate resource allocation as a multi-objective discrete search guided by empirical layer-wise sensitivity, and implement it with a three-stage pipeline comprising KL-based sensitivity profiling, evolutionary exploration, and Bayesian refinement. Extensive experiments across LLaMA and Qwen models, including modern instruction tuning on OpenOrca and comparisons with strong PEFT baselines such as QDoRA, show that QR-Adaptor establishes a strong Pareto frontier: under a strict 4-bit memory budget, it matches or approaches 16-bit baselines while using substantially less memory.- Anthology ID:
- 2026.findings-acl.779
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15885–15896
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.779/
- DOI:
- Cite (ACL):
- Changhai Zhou, Shiyang Zhang, Yuhua Zhou, Jun Gao, Qian Qiao, Shichao Weng, Weizhong Zhang, and Cheng Jin. 2026. Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies. In Findings of the Association for Computational Linguistics: ACL 2026, pages 15885–15896, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies (Zhou et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.779.pdf