The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs

Lucas Bandarkar, Nanyun Peng


Abstract
Large language models (LLMs) still struggle across tasks outside of high-resource languages. In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce. Building on prior work, we first validate that the subsets of model parameters that matter most for mathematical reasoning and multilingual capabilities are distinctly non-overlapping. To exploit this implicit separability between task and target language parameterization, we develop and analyze numerous modular frameworks to improve the composition of the two during fine-tuning. These methods generally employ freezing parameters or post hoc model merging to assign math and language improvement to different key parts of the LLM. In the absence of in-language math data, we demonstrate that the modular approaches successfully improve upon baselines across three languages, four models, and two fine-tuning paradigms (full and LoRA). Furthermore, we identify the most consistently successful modular method to be fine-tuning separate language and math experts and model merging via Layer-Swapping, somewhat surprisingly. We offer possible explanations for this result via recent works on the linearity of task vectors. We further explain this by empirically showing that reverting less useful fine-tuning updates after training often outperforms freezing them from the start.
Anthology ID:
2025.mrl-main.10
Volume:
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Month:
November
Year:
2025
Address:
Suzhuo, China
Editors:
David Ifeoluwa Adelani, Catherine Arnett, Duygu Ataman, Tyler A. Chang, Hila Gonen, Rahul Raja, Fabian Schmidt, David Stap, Jiayi Wang
Venues:
MRL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
131–148
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.10/
DOI:
Bibkey:
Cite (ACL):
Lucas Bandarkar and Nanyun Peng. 2025. The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs. In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), pages 131–148, Suzhuo, China. Association for Computational Linguistics.
Cite (Informal):
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs (Bandarkar & Peng, MRL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.10.pdf