Xiaofan Zheng

2026

UMMF: Protecting Copyright of Large Vision-Language Models through Unlearning-based Multimodal Memorization Fingerprint
Xiaofan Zheng | Xinghao Wang | Xiaojun Wan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Training Large Vision-Language Models (LVLMs) is costly and resource-intensive, making them valuable assets. To prevent malicious users from unauthorized commercialization of these artificial intelligence assets through fine-tuning and black-box deployment, model fingerprinting techniques aimed at verifying the ownership of LVLMs are receiving widespread attention. Existing fingerprinting techniques rely on adversarial attacks or backdoor attacks to construct trigger images for specific outputs, attributing model ownership by comparing whether the output of trigger images on suspected models matches the predetermined output. However, these methods depend on fixed-form triggers as explicit model fingerprints, which have limitations in terms of stealthiness and robustness. Inspired by unlearning research, we propose Unlearning-based Multimodal Memorization Fingerprint (UMMF). UMMF strengthens the overfitting characteristics of training samples by unlearning neighboring samples of the training samples, thereby introducing detectable regions of poor generalization in the data manifold. Compared with previous methods, our approach leverages the differences in memorization strength of LVLMs on neighboring samples as implicit model fingerprints, rather than relying on specific input-output pairs as explicit triggers. This endows it with stronger stealthiness, robustness, and adaptability. To simulate real application scenarios, we conduct extensive experiments using multiple strategies and different datasets, further demonstrating its superiority in protecting LVLM ownership.

pdf bib abs

Ghost in the Shell: Synonym-Aware Logit Shaping Fingerprint for Copyright Protection of Large Vision-Language Models
Xiaofan Zheng | Xinghao Wang | Xiaojun Wan
Findings of the Association for Computational Linguistics: ACL 2026

The proliferation of Large Vision-Language Models (LVLMs) has exacerbated concerns regarding model misappropriation and license violations. Malicious users may deploy open-source models as black boxes and falsely claim ownership, sparking significant community interest in fingerprinting techniques for copyright authentication. Current fingerprinting methods largely follow a backdoor-based paradigm, employing specific inputs to elicit predetermined abnormal text outputs. However, such direct distortion of the model’s original predictions compromises modality alignment and inevitably degrades multimodal capabilities, leading to an inherent trade-off between robustness and harmlessness. To address these challenges, we investigate whether it is possible to embed robust fingerprints while maximally preserving the original normal outputs of the model. We propose a Synonym-Aware Logit Shaping Fingerprint (SALSF). The core insight of SALSF lies in reshaping the probability distribution of semantically similar long-tail tokens within the logits space while ensuring the original top-1 prediction token and its probability remain approximately invariant. By elevating the overall prediction probability of the semantic cluster to a level distinctly higher than the natural baseline, our approach stealthily embeds the fingerprint and mitigates the disruption to modality alignment. Experimental results demonstrate that SALSF maintains multimodal performance and substantially enhances fingerprint robustness, offering a novel paradigm for the intellectual property protection of LVLMs.

2025

pdf bib abs

Unveiling Fake News with Adversarial Arguments Generated by Multimodal Large Language Models
Xiaofan Zheng | Minnan Luo | Xinghao Wang
Proceedings of the 31st International Conference on Computational Linguistics

In the era of social media, the proliferation of fake news has created an urgent need for more effective detection methods, particularly for multimodal content. The task of identifying fake news is highly challenging, as it requires broad background knowledge and understanding across various domains. Existing detection methods primarily rely on neural networks to learn latent feature representations, resulting in black-box classifications with limited real-world understanding. To address these limitations, we propose a novel approach that leverages Multimodal Large Language Models (MLLMs) for fake news detection. Our method introduces adversarial reasoning through debates from opposing perspectives. By harnessing the powerful capabilities of MLLMs in text generation and cross-modal reasoning, we guide these models to engage in multimodal debates, generating adversarial arguments based on contradictory evidence from both sides of the issue. We then utilize these arguments to learn reasonable thinking patterns, enabling better multimodal fusion and fine-tuning. This process effectively positions our model as a debate referee for adversarial inference. Extensive experiments conducted on four fake news detection datasets demonstrate that our proposed method significantly outperforms state-of-the-art approaches.

pdf bib abs

Tracing Training Footprints: A Calibration Approach for Membership Inference Attacks Against Multimodal Large Language Models
Xiaofan Zheng | Huixuan Zhang | Xiaojun Wan
Findings of the Association for Computational Linguistics: EMNLP 2025

With the increasing scale of training data for Multimodal Large Language Models (MLLMs) and the lack of data details, there is growing concern about privacy breaches and data security issues. Under black-box access, exploring effective Membership Inference Attacks (MIA) has garnered increasing attention. In real-world applications, where most samples are non-members, the issue of non-members being over-represented in the data manifold, leading to misclassification as member samples, becomes more prominent. This has motivated recent work to focus on developing effective difficulty calibration strategies, producing promising results. However, these methods only consider text-only input during calibration, and their effectiveness is diminished when migrated to MLLMs due to the presence of visual embeddings. To address the above problem, we propose PC-MMIA, focusing on visual instruction fine-tuning data. PC-MMIA is based on the idea that tokens located in poorly generalized local manifolds can better reflect traces of member samples that have been trained. By employing bidirectional perturbation of image embeddings to capture tokens critical to MIA and assigning them different weights, we achieve difficulty calibration. Experimental results demonstrate that our proposed method surpasses existing methods.

Co-authors

Venues

Fix author