Amir Aminifar


2026

Fine-tuning the bias terms of large language models (LLMs) has the potential to achieve unprecedented parameter efficiency while maintaining competitive performance, particularly in low-data regimes. However, the link between fine-tuning different bias terms (i.e., bq, bk, bv in the query, key, or value projections) and downstream performance remains largely unclear to date. In this paper, we investigate the link between fine-tuning bq, bk, bv with the performance of the downstream task. Our key finding is that *directly fine-tuning bv generally leads to higher downstream performance in low-data regimes, in comparison to bq and bk*. We extensively evaluate this unique property across a wide range of LLMs spanning encoder-only and decoder-only architectures up to 6.7B parameters (including bias-free LLMs). Our results provide strong evidence for the effectiveness of directly fine-tuning bv across various downstream tasks. The implementation code is available at https://github.com/whubaichuan/BEFT.