Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models

Zenghui Yuan, Yangming Xu, Jiawen Shi, Pan Zhou, Lichao Sun


Abstract
Model merging for Large Language Models (LLMs) directly fuses the parameters of different models finetuned on various tasks, creating a unified model for multi-domain tasks. However, due to potential vulnerabilities in models available on open-source platforms, model merging is susceptible to backdoor attacks. In this paper, we propose Merge Hijacking, the first backdoor attack targeting model merging in LLMs. The attacker constructs a malicious upload model and releases it. Once a victim user merges it with any other models, the resulting merged model inherits the backdoor while maintaining utility across tasks. Merge Hijacking defines two main objectives—effectiveness and utility—and achieves them through four steps. Extensive experiments demonstrate the effectiveness of our attack across different models, merging algorithms, and tasks. Additionally, we show that the attack remains effective even when merging real-world models. Moreover, our attack demonstrates robustness against two inference-time defenses (Paraphrasing and CLEANGEN) and one training-time defense (Fine-pruning).
Anthology ID:
2025.acl-long.1571
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32688–32703
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1571/
DOI:
Bibkey:
Cite (ACL):
Zenghui Yuan, Yangming Xu, Jiawen Shi, Pan Zhou, and Lichao Sun. 2025. Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32688–32703, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models (Yuan et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1571.pdf