TIU-Bench: A Benchmark for Evaluating Large Multimodal Models on Text-rich Image Understanding

Kun Zhang, Liqiang Niu, Zhen Cao, Fandong Meng, Jie Zhou


Abstract
Text-rich images are ubiquitous in real-world applications, serving as a critical medium for conveying complex information and facilitating accessibility.Despite recent advances driven by Multimodal Large Language Models (MLLMs), existing benchmarks suffer from limited scale, fragmented scenarios, and evaluation protocols that fail to fully capture holistic image understanding.To address these gaps, we present TIU-Bench, a large-scale, multilingual benchmark comprising over 100,000 full-image annotations and 22,000 rigorously validated question-answer (QA) pairs that span 18 subtasks across diverse real-world scenarios.TIU-Bench introduces a novel full-image structured output format that jointly models geometric, textual, and relational information, enabling fine-grained evaluation of perception and reasoning capabilities. Furthermore, we propose a two-stage understanding framework named T2TIU, which first generates a structured representation of the entire image and subsequently conducts reasoning on this representation to address complex visual-textual queries.Extensive experiments on 10 state-of-the-art generative models highlight the challenges and opportunities in advancing text-rich image understanding.Our benchmark and framework provide a comprehensive platform for developing and evaluating next-generation multimodal AI systems.
Anthology ID:
2025.findings-emnlp.1318
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24286–24295
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1318/
DOI:
10.18653/v1/2025.findings-emnlp.1318
Bibkey:
Cite (ACL):
Kun Zhang, Liqiang Niu, Zhen Cao, Fandong Meng, and Jie Zhou. 2025. TIU-Bench: A Benchmark for Evaluating Large Multimodal Models on Text-rich Image Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24286–24295, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
TIU-Bench: A Benchmark for Evaluating Large Multimodal Models on Text-rich Image Understanding (Zhang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1318.pdf
Checklist:
 2025.findings-emnlp.1318.checklist.pdf