DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections
Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Hyunwoo Yoo, Qi Chen, Yeyun Gong
Abstract
As numerous instruction-tuning datasets continue to emerge, dynamically balancing and optimizing their mixtures has become a criticalchallenge. To address this, we propose DynamixSFT, a dynamic and automated method for instruction-tuning dataset mixture optimization. We formulate the problem as a multi-armed bandit setup and introduce a Prior-scaled Boltzmann Exploration that softly anchors the updated sampling distribution to the original dataset proportions, thereby preserving the inherent diversity and coverage of the collection. Sampling probabilities are updated using a lightweight 1-Step Look-ahead Reward, reflecting how much the dataset contributes to improving the model’s performance at its current state. We demonstrate that DynamixSFT effectively optimizes the TÜLU-2-mixture andTÜLU-3-mixture collections across 10 benchmarks, while introducing minimal computational overhead over naive sampling. Furthermore, we provide a comprehensive analysis and visualizations to offer deeper insights into the adaptive dynamics of our method.- Anthology ID:
- 2026.findings-acl.1972
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 39590–39603
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1972/
- DOI:
- Cite (ACL):
- Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Hyunwoo Yoo, Qi Chen, and Yeyun Gong. 2026. DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39590–39603, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections (Shin et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1972.pdf