Hongxing Wang

Other people with similar names: Hongxing Wang


2026

Large Vision-Language Models (LVLMs) confront an escalating threat from sophisticated multimodal jailbreak attacks. However, existing defense strategies suffer from three critical limitations: (1) the neglect of visual threats; (2) a lack of fine-grained specificity regarding specific attack semantics; and (3) the absence of a dedicated jailbreak detection mechanism, which leads to unnecessary defensive measures against benign inputs. To address these limitations, we propose ReCon, a novel black-box defense framework. ReCon integrates a diffusion-based image purifier to neutralize visual perturbations and an autoencoder-based detector for anomaly filtration. At its core, it employs a Reverse Safety Concept Injection module that maps detected unsafe concepts to fine-grained, constructive Safe Concepts, generating targeted prompts to precisely rectify attack semantics. Extensive experiments demonstrate that ReCon significantly enhances the robustness of LVLMs against jailbreak attacks while preserving performance on benign tasks. Disclaimer: Samples in this paper may be harmful and cause discomfort.