Mingrui Xie
2025
DAPE-BR: Distance-Aware Positional Encoding for Mitigating Object Hallucination in LVLMs
Mingrui Xie
|
Tianxiang Xu
|
Qianhai Tang
|
Shanming Yao
|
Xiaofeng Zhang
|
Junliang Du
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Vision–Language Models (LVLMs) have garnered substantial interest owing to their impressive ability to interpret visual inputs and converse with users.Nevertheless, LVLMs still suffer from object hallucination – generating descriptions for objects that are absent from the image, which undermines reliability and hinders real-world deployment. We propose DAPE-BR, a positional-alignment scheme that (i) preserves the pretrained weight order while globally—- visual–text distances, (ii) embeds an isotropic fused patch-distance metric, and (iii) applies a patch-distance causal mask to enforce spatial causality. Extensive experiments on POPE, MMStar and SQA show that DAPE-BR consistently reduces hallucinations and boosts.
MAFMO: Multi-modal Adaptive Fusion with Meta-template Optimization for Vision-Language Models
Mingrui Xie
|
Lulu Xu
|
Junliang Du
Findings of the Association for Computational Linguistics: EMNLP 2025
Vision-language models like CLIP demonstrate exceptional generalization capabilities but face significant adaptation challenges due to parameter scale, prompt sensitivity, and cross-modal alignment difficulties. Existing approaches primarily focus on single-modality adjustments, leading to suboptimal alignment and limited generalization. We introduce MAFMO, a plug-and-play framework comprising: (1) a Harmonic Cross-Modal Adapter enabling efficient cross-modal knowledge transfer; (2) a Meta-Template Optimization module dynamically generating input-dependent templates; and (3) a Cross-Modal Knowledge Synthesis mechanism preserving critical structural relationships during adaptation. Extensive experiments across multiple fine-grained visual recognition benchmarks demonstrate MAFMO consistently improves existing methods’ performance on both novel classes and harmonic mean, while maintaining robustness under various challenging conditions with minimal computational overhead.
Search
Fix author
Co-authors
- Junliang Du 2
- Qianhai Tang 1
- Tianxiang Xu 1
- Lulu Xu 1
- Shanming Yao 1
- show all...