LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao


Abstract
Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs. While model merging offers a training-free solution to integrate multiple task-specific models, existing methods suffer from safety-utility conflicts where enhanced general capabilities degrade safety safeguards. We identify two root causes: neuron misidentification due to simplistic parameter magnitude-based selection, and cross-task neuron interference during merging.To address these challenges, we propose LED-Merging, a three-stage framework that Locates task-specific neurons via gradient-based attribution, dynamically Elects critical neurons through multi-model importance fusion, and Disjoints conflicting updates through parameter isolation.Extensive experiments on Llama-3-8B, Mistral-7B, and Llama2-13B demonstrate that LED-Merging effectively reduces harmful response rates, showing a 31.4% decrease on Llama-3-8B-Instruct on HarmBench, while simultaneously preserving 95% of utility performance, such as achieving 52.39% accuracy on GSM8K.LED-Merging resolves safety-utility conflicts and provides a lightweight, training-free paradigm for constructing reliable multi-task LLMs.Code is available at https://github.com/MqLeet/LED-Merging
Anthology ID:
2025.acl-long.1055
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21749–21767
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1055/
DOI:
Bibkey:
Cite (ACL):
Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, and Jing Shao. 2025. LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 21749–21767, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint (Ma et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1055.pdf