FuseChat: Knowledge Fusion of Chat Models

Fanqi Wan; Longguang Zhong; Ziyi Yang; Ruijun Chen; Xiaojun Quan

FuseChat: Knowledge Fusion of Chat Models

Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, Xiaojun Quan

Abstract

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more potent LLM through lightweight continual training, thereby reducing the need for costly LLM development. In this work, we propose a new framework for the knowledge fusion of chat LLMs through two main stages, resulting in FuseChat. Firstly, we conduct pairwise knowledge fusion on source chat LLMs of varying structures and scales to create multiple target LLMs with identical structure and size via lightweight fine-tuning. During this process, a statistics-based token alignment approach is introduced as the cornerstone for fusing LLMs with different structures. Secondly, we merge these target LLMs within the parameter space, where we propose a novel method for determining the merging coefficients based on the magnitude of parameter updates before and after fine-tuning. We implement and validate FuseChat using six prominent chat LLMs with diverse architectures and scales. Experimental results on two instruction-following benchmarks, AlpacaEval 2.0 and MT-Bench, demonstrate the superiority of FuseChat-7B over baselines of various sizes.

Anthology ID:: 2025.emnlp-main.1096
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21629–21653
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1096/
DOI:
Bibkey:
Cite (ACL):: Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, and Xiaojun Quan. 2025. FuseChat: Knowledge Fusion of Chat Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21629–21653, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: FuseChat: Knowledge Fusion of Chat Models (Wan et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1096.pdf
Checklist:: 2025.emnlp-main.1096.checklist.pdf

PDF Cite Search Checklist Fix data