DAPE-BR: Distance-Aware Positional Encoding for Mitigating Object Hallucination in LVLMs
Mingrui Xie, Tianxiang Xu, Qianhai Tang, Shanming Yao, Xiaofeng Zhang, Junliang Du
Abstract
Large Vision–Language Models (LVLMs) have garnered substantial interest owing to their impressive ability to interpret visual inputs and converse with users.Nevertheless, LVLMs still suffer from object hallucination – generating descriptions for objects that are absent from the image, which undermines reliability and hinders real-world deployment. We propose DAPE-BR, a positional-alignment scheme that (i) preserves the pretrained weight order while globally—- visual–text distances, (ii) embeds an isotropic fused patch-distance metric, and (iii) applies a patch-distance causal mask to enforce spatial causality. Extensive experiments on POPE, MMStar and SQA show that DAPE-BR consistently reduces hallucinations and boosts.- Anthology ID:
- 2025.findings-emnlp.459
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8638–8649
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.459/
- DOI:
- 10.18653/v1/2025.findings-emnlp.459
- Cite (ACL):
- Mingrui Xie, Tianxiang Xu, Qianhai Tang, Shanming Yao, Xiaofeng Zhang, and Junliang Du. 2025. DAPE-BR: Distance-Aware Positional Encoding for Mitigating Object Hallucination in LVLMs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 8638–8649, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- DAPE-BR: Distance-Aware Positional Encoding for Mitigating Object Hallucination in LVLMs (Xie et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.459.pdf