DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections

Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Hyunwoo Yoo, Qi Chen, Yeyun Gong


Abstract
As numerous instruction-tuning datasets continue to emerge, dynamically balancing and optimizing their mixtures has become a criticalchallenge. To address this, we propose DynamixSFT, a dynamic and automated method for instruction-tuning dataset mixture optimization. We formulate the problem as a multi-armed bandit setup and introduce a Prior-scaled Boltzmann Exploration that softly anchors the updated sampling distribution to the original dataset proportions, thereby preserving the inherent diversity and coverage of the collection. Sampling probabilities are updated using a lightweight 1-Step Look-ahead Reward, reflecting how much the dataset contributes to improving the model’s performance at its current state. We demonstrate that DynamixSFT effectively optimizes the TÜLU-2-mixture andTÜLU-3-mixture collections across 10 benchmarks, while introducing minimal computational overhead over naive sampling. Furthermore, we provide a comprehensive analysis and visualizations to offer deeper insights into the adaptive dynamics of our method.
Anthology ID:
2026.findings-acl.1972
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39590–39603
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1972/
DOI:
Bibkey:
Cite (ACL):
Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Hyunwoo Yoo, Qi Chen, and Yeyun Gong. 2026. DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39590–39603, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections (Shin et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1972.pdf
Checklist:
 2026.findings-acl.1972.checklist.pdf