MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation

Zehang Wei, JiaXin Dai, Jiamin Yan, Xiang Xiang


Abstract
While Multimodal Retrieval-Augmented Generation (M-RAG) enhances Large Vision-Language Models, it remains highly susceptible to cross-modal hallucinations, causal fabrications, and sycophancy. Furthermore, existing mitigation pipelines often face an intervention paradox: static rules tend to unnecessarily disrupt accurate generations, whereas leaving the multi-modal reasoning completely unguided allows existing mismatches to cascade into severe logical fabrications. To quantify and mitigate these hallucinations, we propose a Multi-Agent system, MODE-RAG, driven by Variational Free Energy (VFE) and internal attention states to dynamically gate interventions. High-risk queries are routed to five stage-specific agents, integrating Monte Carlo Tree Search (MCTS) for rigorous causal derivation and logit perturbations to penalize sycophancy. Dedicated Correction and Overseer agents ensure formatting stability and perform post-hoc factual verification. To objectively evaluate our approach, we introduce ModeVent, a challenging subset derived from the MultiVent dataset. Extensive experiments indicate that our system effectively reduces hallucination rates and logical fabrication, significantly improving the robustness of M-RAG systems.
Anthology ID:
2026.magmar-main.6
Volume:
Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026)
Month:
July
Year:
2026
Address:
San Diego, USA
Editors:
Kenton Murray, Reno Kriz
Venues:
MAGMaR | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–26
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.magmar-main.6/
DOI:
Bibkey:
Cite (ACL):
Zehang Wei, JiaXin Dai, Jiamin Yan, and Xiang Xiang. 2026. MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation. In Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026), pages 11–26, San Diego, USA. Association for Computational Linguistics.
Cite (Informal):
MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation (Wei et al., MAGMaR 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.magmar-main.6.pdf