On the Limits of Model Merging for Multilinguality in Pre-Training
Seth Aycock, Fedor Vitiugin, Aleksandr Umnov, Christof Monz, Khalil Sima’an
Abstract
Endowing models with consistent multilingual performance can be achieved by _mixing_ pre-training data, or post-training approaches such as language-specific model _merging_. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads to performance collapse due to interference. Our analysis suggests representational similarity is a prerequisite for model merging. We therefore conclude that the flexibility of merging in fine-tuning does not extend trivially to language-specific pre-training.- Anthology ID:
- 2026.mellm-1.15
- Volume:
- Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, United States
- Editors:
- Kaiyu Huang, Fengran Mo, Pinzhen Chen, Meng Jiang
- Venues:
- MeLLM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 159–169
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.mellm-1.15/
- DOI:
- Cite (ACL):
- Seth Aycock, Fedor Vitiugin, Aleksandr Umnov, Christof Monz, and Khalil Sima’an. 2026. On the Limits of Model Merging for Multilinguality in Pre-Training. In Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026), pages 159–169, San Diego, United States. Association for Computational Linguistics.
- Cite (Informal):
- On the Limits of Model Merging for Multilinguality in Pre-Training (Aycock et al., MeLLM 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.mellm-1.15.pdf