MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Haonan Ge; Yiwei Wang; Ming-Hsuan Yang; Yujun Cai

doi:10.18653/v1/2025.findings-emnlp.858

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, Yujun Cai

Abstract

Large Vision-Language Models (LVLMs) have shown strong performance across multimodal tasks. However, they often produce hallucinations—text that is inconsistent with visual input, due to the limited ability to verify information in different regions of the image. To address this, we propose **Multi-Region Fusion Decoding (MRFD)**, a training-free decoding method that improves factual grounding by modeling inter-region consistency. MRFD identifies salient regions using cross-attention, generates initial responses for each, and computes reliability weights based on Jensen-Shannon Divergence (JSD) among the responses. These weights guide a consistency-aware fusion of per-region predictions, using region-aware prompts inspired by Chain-of-Thought reasoning. Experiments across multiple LVLMs and benchmarks show that MRFD significantly reduces hallucinations and improves response factuality without requiring model updates.

Anthology ID:: 2025.findings-emnlp.858
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15860–15879
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.858/
DOI:: 10.18653/v1/2025.findings-emnlp.858
Bibkey:
Cite (ACL):: Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, and Yujun Cai. 2025. MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15860–15879, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs (Ge et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.858.pdf
Checklist:: 2025.findings-emnlp.858.checklist.pdf

PDF Cite Search Checklist Fix data