Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning

Mohit Raghavendra, Junmo Kang, Alan Ritter


Abstract
Post-training of Large Language Models often involves a pipeline of Supervised Finetuning (SFT) followed by Preference Finetuning (PFT) using methods like Direct Preference Optimization. Both stages require annotated data that are very different in structure and costs. We study how to optimally allocate a fixed training data budget between the two stages, through extensive experiments spanning four diverse tasks, multiple model sizes and various data annotation costs. Our findings reveal that just SFT on the base model dominates performance in low-data regimes (<1,000 annotated examples). With larger data-budgets, we observe that a combination of SFT and PFT, often with increasing portions allocated towards preference data yields optimal performance. However, completely eliminating SFT and running PFT directly on the base model yields suboptimal performance, described as the cold start problem on tasks like mathematics. We observe that this is due to the distribution shift arising from using DPO directly on the base model to elicit step-by-step reasoning. This limitation can be effectively addressed by allocating even a small portion (<10%) of the budget to SFT first, resulting in performance improvements of 15-20% on analytical benchmarks like GSM8k. These results provide actionable insights for researchers and practitioners optimizing model development under budget constraints, where high-quality data curation often represents a significant portion of the total costs of model development.
Anthology ID:
2025.acl-long.1248
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25702–25720
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1248/
DOI:
Bibkey:
Cite (ACL):
Mohit Raghavendra, Junmo Kang, and Alan Ritter. 2025. Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25702–25720, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning (Raghavendra et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1248.pdf