Practical Guidelines for Model Merging in LLMs Pre-Training
Giuseppe Curci, Stefano Simonazzi, Andrea molinari, Andrea Zugarini
Abstract
Model merging is widely used to combine fine-tuned models trained with different data distributions, tasks, or hyperparameters, yet its role during LLM pre-training remains underexplored. We systematically study checkpoint merging across training phases, focusing on the transition from stable to decaying learning rates. Across multiple scales, we find that simple averaging methods consistently improve performance during stable learning rate regimes, but gains sharply diminish during decay. We link this effect to reduced checkpoint diversity and show that merging effectiveness correlates with parameter-space variation. Strategies such as synthetic variability, task-vector merging, and cross-run merging yield only modest improvements. Our results provide practical insights on when merging is most effective in large-scale pre-training.- Anthology ID:
- 2026.acl-industry.105
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Yunyao Li, Georg Rehm, Mei Tu
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1519–1532
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-industry.105/
- DOI:
- Cite (ACL):
- Giuseppe Curci, Stefano Simonazzi, Andrea molinari, and Andrea Zugarini. 2026. Practical Guidelines for Model Merging in LLMs Pre-Training. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1519–1532, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- Practical Guidelines for Model Merging in LLMs Pre-Training (Curci et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-industry.105.pdf