The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs

Lucas Bandarkar; Nanyun Peng

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs

Abstract

Large language models (LLMs) still struggle across tasks outside of high-resource languages. In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce. Building on prior work, we first validate that the subsets of model parameters that matter most for mathematical reasoning and multilingual capabilities are distinctly non-overlapping. To exploit this implicit separability between task and target language parameterization, we develop and analyze numerous modular frameworks to improve the composition of the two during fine-tuning. These methods generally employ freezing parameters or post hoc model merging to assign math and language improvement to different key parts of the LLM. In the absence of in-language math data, we demonstrate that the modular approaches successfully improve upon baselines across three languages, four models, and two fine-tuning paradigms (full and LoRA). Furthermore, we identify the most consistently successful modular method to be fine-tuning separate language and math experts and model merging via Layer-Swapping, somewhat surprisingly. We offer possible explanations for this result via recent works on the linearity of task vectors. We further explain this by empirically showing that reverting less useful fine-tuning updates after training often outperforms freezing them from the start.

Anthology ID:: 2025.mrl-main.10
Volume:: Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Month:: November
Year:: 2025
Address:: Suzhuo, China
Editors:: David Ifeoluwa Adelani, Catherine Arnett, Duygu Ataman, Tyler A. Chang, Hila Gonen, Rahul Raja, Fabian Schmidt, David Stap, Jiayi Wang
Venues:: MRL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 131–148
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.10/
DOI:
Bibkey:
Cite (ACL):: Lucas Bandarkar and Nanyun Peng. 2025. The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs. In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), pages 131–148, Suzhuo, China. Association for Computational Linguistics.
Cite (Informal):: The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs (Bandarkar & Peng, MRL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.10.pdf

PDF Cite Search Fix data