Fine-Grained Data Ordering Improves Fine-Tuning for Large Language Models

Xiaomeng Hu, Yixuan Tang, Haoze Li, Hao Chen, Qi Zhang, Zhanming Shen, Yiming Zhang, Haobo Wang, Junbo Zhao


Abstract
With the rapid progress of large language models (LLMs), aligning a general-purpose model with downstream tasks through fine-tuning has become a central research focus. Selecting only high-quality examples for training has been shown to be one of the most effective ways to improve fine-tuning performance. However, prior work concentrates almost exclusively on data preprocessing: filtering and cleaning data before training begins. While the order and composition of training data during training have received little fine-grained attention. To fill this gap, our work proposed Fine-Grained Order Fine-Tuning, a fine-grained scheduling method of data order in epochs. Drawing on curriculum-learning principles, FOT defines data difficulty based on the relevance between the data and the model, and then performs dynamic scheduling of the training order in each epoch according to the difficulty. On both large-scale continued pre-training and small-scale supervised fine-tuning experiments, FOT has achieved an average 2.4% improvement over baselines. Our study offers a new perspective on data governance in the fine-tuning phase.
Anthology ID:
2026.findings-acl.1021
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20406–20418
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1021/
DOI:
Bibkey:
Cite (ACL):
Xiaomeng Hu, Yixuan Tang, Haoze Li, Hao Chen, Qi Zhang, Zhanming Shen, Yiming Zhang, Haobo Wang, and Junbo Zhao. 2026. Fine-Grained Data Ordering Improves Fine-Tuning for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20406–20418, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Fine-Grained Data Ordering Improves Fine-Tuning for Large Language Models (Hu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1021.pdf
Checklist:
 2026.findings-acl.1021.checklist.pdf