Practical Guidelines for Model Merging in LLMs Pre-Training

Giuseppe Curci, Stefano Simonazzi, Andrea molinari, Andrea Zugarini


Abstract
Model merging is widely used to combine fine-tuned models trained with different data distributions, tasks, or hyperparameters, yet its role during LLM pre-training remains underexplored. We systematically study checkpoint merging across training phases, focusing on the transition from stable to decaying learning rates. Across multiple scales, we find that simple averaging methods consistently improve performance during stable learning rate regimes, but gains sharply diminish during decay. We link this effect to reduced checkpoint diversity and show that merging effectiveness correlates with parameter-space variation. Strategies such as synthetic variability, task-vector merging, and cross-run merging yield only modest improvements. Our results provide practical insights on when merging is most effective in large-scale pre-training.
Anthology ID:
2026.acl-industry.105
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Yunyao Li, Georg Rehm, Mei Tu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1519–1532
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.105/
DOI:
Bibkey:
Cite (ACL):
Giuseppe Curci, Stefano Simonazzi, Andrea molinari, and Andrea Zugarini. 2026. Practical Guidelines for Model Merging in LLMs Pre-Training. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1519–1532, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Practical Guidelines for Model Merging in LLMs Pre-Training (Curci et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.105.pdf