On the Limits of Model Merging for Multilinguality in Pre-Training

Seth Aycock, Fedor Vitiugin, Aleksandr Umnov, Christof Monz, Khalil Sima’an


Abstract
Endowing models with consistent multilingual performance can be achieved by _mixing_ pre-training data, or post-training approaches such as language-specific model _merging_. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads to performance collapse due to interference. Our analysis suggests representational similarity is a prerequisite for model merging. We therefore conclude that the flexibility of merging in fine-tuning does not extend trivially to language-specific pre-training.
Anthology ID:
2026.mellm-1.15
Volume:
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
Month:
July
Year:
2026
Address:
San Diego, United States
Editors:
Kaiyu Huang, Fengran Mo, Pinzhen Chen, Meng Jiang
Venues:
MeLLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
159–169
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.mellm-1.15/
DOI:
Bibkey:
Cite (ACL):
Seth Aycock, Fedor Vitiugin, Aleksandr Umnov, Christof Monz, and Khalil Sima’an. 2026. On the Limits of Model Merging for Multilinguality in Pre-Training. In Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026), pages 159–169, San Diego, United States. Association for Computational Linguistics.
Cite (Informal):
On the Limits of Model Merging for Multilinguality in Pre-Training (Aycock et al., MeLLM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.mellm-1.15.pdf