MCiteBench: A Multimodal Benchmark for Generating Text with Citations

Caiyu Hu; Yikai Zhang; Tinghui Zhu; Yiwei Ye; Yanghua Xiao

doi:10.18653/v1/2025.findings-emnlp.318

MCiteBench: A Multimodal Benchmark for Generating Text with Citations

Caiyu Hu, Yikai Zhang, Tinghui Zhu, Yiwei Ye, Yanghua Xiao

Abstract

Multimodal Large Language Models (MLLMs) have advanced in integrating diverse modalities but frequently suffer from hallucination. A promising solution to mitigate this issue is to generate text with citations, providing a transparent chain for verification. However, existing work primarily focuses on generating citations for text-only content, leaving the challenges of multimodal scenarios largely unexplored. In this paper, we introduce MCiteBench, the first benchmark designed to assess the ability of MLLMs to generate text with citations in multimodal contexts. Our benchmark comprises data derived from academic papers and review-rebuttal interactions, featuring diverse information sources and multimodal content. Experimental results reveal that MLLMs struggle to ground their outputs reliably when handling multimodal input. Further analysis uncovers a systematic modality bias and reveals how models internally rely on different sources when generating citations, offering insights into model behavior and guiding future directions for multimodal citation tasks.

Anthology ID:: 2025.findings-emnlp.318
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5949–5966
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.318/
DOI:: 10.18653/v1/2025.findings-emnlp.318
Bibkey:
Cite (ACL):: Caiyu Hu, Yikai Zhang, Tinghui Zhu, Yiwei Ye, and Yanghua Xiao. 2025. MCiteBench: A Multimodal Benchmark for Generating Text with Citations. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5949–5966, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MCiteBench: A Multimodal Benchmark for Generating Text with Citations (Hu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.318.pdf
Checklist:: 2025.findings-emnlp.318.checklist.pdf

PDF Cite Search Checklist Fix data