Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs

Sruthi Gorantla, Aditya Rawal, Devamanyu Hazarika, Kaixiang Lin, Mingyi Hong, Mahdi Namazifar


Abstract
We introduce a zero-shot merging framework for large language models (LLMs) that consolidates specialized domain experts into a single model without any further training. Our core contribution lies in leveraging relative task vectors—difference representations encoding each expert’s unique traits with respect to a shared base model—to guide a principled and efficient merging process. By dissecting parameters into common dimensions (averaged across experts) and complementary dimensions (unique to each expert), we strike an optimal balance between generalization and specialization. We further devise a compression mechanism for the complementary parameters, retaining only principal components and scalar multipliers per expert, thereby minimizing overhead. A dynamic router then selects the most relevant domain at inference, ensuring that domain-specific precision is preserved. Experiments on code generation, mathematical reasoning, medical question answering, and instruction-following benchmarks confirm the versatility and effectiveness of our approach. Altogether, this framework enables truly adaptive and scalable LLMs that seamlessly integrate specialized knowledge for improved zero-shot performance.
Anthology ID:
2025.emnlp-main.1533
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30135–30154
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1533/
DOI:
Bibkey:
Cite (ACL):
Sruthi Gorantla, Aditya Rawal, Devamanyu Hazarika, Kaixiang Lin, Mingyi Hong, and Mahdi Namazifar. 2025. Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 30135–30154, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs (Gorantla et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1533.pdf
Checklist:
 2025.emnlp-main.1533.checklist.pdf