M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data

Mingyang Zhou; Lingyu Zhang; Sophia Horng; Maximillian Chen; Kung-Hsiang Huang; Shih-Fu Chang

M²-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data

Mingyang Zhou, Lingyu Zhang, Sophia Horng, Maximillian Chen, Kung-Hsiang Huang, Shih-Fu Chang

Abstract

Tabular data is used to store information in many real-world systems ranging from finance to healthcare. However, such structured data is often communicated to humans in visually interpretable formats (e.g. charts and textual paragraphs), making it imperative that fact-checking models should be able to reason over multiple pieces of structured evidence presented across different modalities. In this paper, we propose Multi-Document Multi-Modal Table-based Fact Verification (M²-TabFact), a challenging fact verification task that requires jointly reasoning over visual and textual representations of structured data. We design an automatic data generation pipeline that converts existing tabular data into descriptive visual and textual evidence. We then use Large Language Models to generate complex claims that depend on multi-document, multi-modal evidence. In total, we create 8,856 pairs of complex claims and multi-modal evidence through this procedure and systematically evaluate M²-TabFact with a set of strong vision-language models (VLM). We find that existing VLMs have large gaps in fact verification performance compared to humans. Moreover, we find that they are imbalanced when it comes to their ability to handle reason about different modalities, and currently struggle to reason about information extracted from multiple documents.

Anthology ID:: 2025.findings-acl.1345
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26239–26256
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1345/
DOI:
Bibkey:
Cite (ACL):: Mingyang Zhou, Lingyu Zhang, Sophia Horng, Maximillian Chen, Kung-Hsiang Huang, and Shih-Fu Chang. 2025. M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data. In Findings of the Association for Computational Linguistics: ACL 2025, pages 26239–26256, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data (Zhou et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1345.pdf

PDF Cite Search Fix data