Daiheng Gao


2025

pdf bib
MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG
Pingyu Wu | Daiheng Gao | Jing Tang | Huimin Chen | Wenbo Zhou | Weiming Zhang | Nenghai Yu
Findings of the Association for Computational Linguistics: NAACL 2025

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by using external knowledge, but it struggles with precise entity information retrieval. Our proposed **MES-RAG** framework enhances entity-specific query handling and provides accurate, secure, and consistent responses. MES-RAG introduces proactive security measures that ensure system integrity by applying protections prior to data access. Additionally, the system supports real-time multi-modal outputs, including text, images, audio, and video, seamlessly integrating into existing RAG architectures. Experimental results demonstrate that MES-RAG significantly improves both accuracy and recall, highlighting its effectiveness in advancing the security and utility of question-answering, increasing accuracy to **0.83 (+0.25)** on targeted task. Our code and data are available at https://github.com/wpydcr/MES-RAG.