Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models

Yue Li, Xin Yi, Dongsheng Shi, Gerard De Melo, Xiaoling Wang, Linlin Wang


Abstract
With the increasing size of Large Vision-Language Models (LVLMs), network pruning techniques aimed at compressing models for deployment in resource-constrained environments have garnered significant attention. However, we observe that pruning often leads to a degradation in safety performance. To address this issue, we present a novel and lightweight approach, termed Hierarchical Safety Realignment (HSR). HSR operates by first quantifying the contribution of each attention head to safety, identifying the most critical ones, and then selectively restoring neurons directly within these attention heads that play a pivotal role in maintaining safety. This process hierarchically realigns the safety of pruned LVLMs, progressing from the attention head level to the neuron level. We validate HSR across various models and pruning strategies, consistently achieving notable improvements in safety performance. To our knowledge, this is the first work explicitly focused on restoring safety in LVLMs post-pruning.
Anthology ID:
2025.findings-acl.394
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7600–7612
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.394/
DOI:
Bibkey:
Cite (ACL):
Yue Li, Xin Yi, Dongsheng Shi, Gerard De Melo, Xiaoling Wang, and Linlin Wang. 2025. Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 7600–7612, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models (Li et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.394.pdf