Mingrui Xie


2025

pdf bib
DAPE-BR: Distance-Aware Positional Encoding for Mitigating Object Hallucination in LVLMs
Mingrui Xie | Tianxiang Xu | Qianhai Tang | Shanming Yao | Xiaofeng Zhang | Junliang Du
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Vision–Language Models (LVLMs) have garnered substantial interest owing to their impressive ability to interpret visual inputs and converse with users.Nevertheless, LVLMs still suffer from object hallucination – generating descriptions for objects that are absent from the image, which undermines reliability and hinders real-world deployment. We propose DAPE-BR, a positional-alignment scheme that (i) preserves the pretrained weight order while globally—- visual–text distances, (ii) embeds an isotropic fused patch-distance metric, and (iii) applies a patch-distance causal mask to enforce spatial causality. Extensive experiments on POPE, MMStar and SQA show that DAPE-BR consistently reduces hallucinations and boosts.

pdf bib
MAFMO: Multi-modal Adaptive Fusion with Meta-template Optimization for Vision-Language Models
Mingrui Xie | Lulu Xu | Junliang Du
Findings of the Association for Computational Linguistics: EMNLP 2025

Vision-language models like CLIP demonstrate exceptional generalization capabilities but face significant adaptation challenges due to parameter scale, prompt sensitivity, and cross-modal alignment difficulties. Existing approaches primarily focus on single-modality adjustments, leading to suboptimal alignment and limited generalization. We introduce MAFMO, a plug-and-play framework comprising: (1) a Harmonic Cross-Modal Adapter enabling efficient cross-modal knowledge transfer; (2) a Meta-Template Optimization module dynamically generating input-dependent templates; and (3) a Cross-Modal Knowledge Synthesis mechanism preserving critical structural relationships during adaptation. Extensive experiments across multiple fine-grained visual recognition benchmarks demonstrate MAFMO consistently improves existing methods’ performance on both novel classes and harmonic mean, while maintaining robustness under various challenging conditions with minimal computational overhead.