Shanming Yao
2025
DAPE-BR: Distance-Aware Positional Encoding for Mitigating Object Hallucination in LVLMs
Mingrui Xie
|
Tianxiang Xu
|
Qianhai Tang
|
Shanming Yao
|
Xiaofeng Zhang
|
Junliang Du
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Vision–Language Models (LVLMs) have garnered substantial interest owing to their impressive ability to interpret visual inputs and converse with users.Nevertheless, LVLMs still suffer from object hallucination – generating descriptions for objects that are absent from the image, which undermines reliability and hinders real-world deployment. We propose DAPE-BR, a positional-alignment scheme that (i) preserves the pretrained weight order while globally—- visual–text distances, (ii) embeds an isotropic fused patch-distance metric, and (iii) applies a patch-distance causal mask to enforce spatial causality. Extensive experiments on POPE, MMStar and SQA show that DAPE-BR consistently reduces hallucinations and boosts.