FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain

Suifeng Zhao, Zhuoran Jin, Sujian Li, Jun Gao


Abstract
Retrieval-Augmented Generation (RAG) plays a vital role in the financial domain, powering applications such as real-time market analysis, trend forecasting, and interest rate computation. However, most existing RAG research in finance focuses predominantly on textual data, overlooking the rich visual content in financial documents, resulting in the loss of key analytical insights. To bridge this gap, we present FinRAGBench-V, a comprehensive visual RAG benchmark tailored for finance. This benchmark effectively integrates multimodal data and provides visual citation to ensure traceability. It includes a bilingual retrieval corpus with 60,780 Chinese and 51,219 English pages, along with a high-quality, human-annotated question-answering (QA) dataset spanning heterogeneous data types and seven question categories. Moreover, we introduce RGenCite, an RAG baseline that seamlessly integrates visual citation with generation. Furthermore, we propose an automatic citation evaluation method to systematically assess the visual citation capabilities of Multimodal Large Language Models (MLLMs). Extensive experiments on RGenCite underscore the challenging nature of FinRAGBench-V, providing valuable insights for the development of multimodal RAG systems in finance.
Anthology ID:
2025.emnlp-main.211
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4215–4249
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.211/
DOI:
Bibkey:
Cite (ACL):
Suifeng Zhao, Zhuoran Jin, Sujian Li, and Jun Gao. 2025. FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4215–4249, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain (Zhao et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.211.pdf
Checklist:
 2025.emnlp-main.211.checklist.pdf