REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark

Navve Wasserman; Roi Pony; Oshri Naparstek; Adi Raz Goldfarb; Eli Schwartz; Udi Barzelay; Leonid Karlinsky

REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark

Navve Wasserman, Roi Pony, Oshri Naparstek, Adi Raz Goldfarb, Eli Schwartz, Udi Barzelay, Leonid Karlinsky

Abstract

Accurate multi-modal document retrieval is crucial for Retrieval-Augmented Generation (RAG), yet existing benchmarks do not fully capture real-world challenges with their current design. We introduce REAL-MM-RAG, an automatically generated benchmark designed to address four key properties essential for real-world retrieval: (i) multi-modal documents, (ii) enhanced difficulty, (iii) Realistic-RAG queries and (iv) accurate labeling. Additionally, we propose a multi-difficulty-level scheme based on query rephrasing to evaluate models’ semantic understanding beyond keyword matching. Our benchmark reveals significant model weaknesses, particularly in handling table-heavy documents and robustness to query rephrasing. To mitigate these shortcomings, we curate a rephrased training set and introduce a new finance-focused, table-heavy dataset. Fine-tuning on these datasets enables models to achieve state-of-the-art retrieval performance on REAL-MM-RAG benchmark. Our work offers a better way to evaluate and improve retrieval in multi-modal RAG systems while also providing training data and models that address current limitations.

Anthology ID:: 2025.acl-long.1528
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31660–31683
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1528/
DOI:
Bibkey:
Cite (ACL):: Navve Wasserman, Roi Pony, Oshri Naparstek, Adi Raz Goldfarb, Eli Schwartz, Udi Barzelay, and Leonid Karlinsky. 2025. REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31660–31683, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark (Wasserman et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1528.pdf

PDF Cite Search Fix data