T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Kaiyue Sun; Rongyao Fang; Chengqi Duan; Xian Liu; Aoxue Li; Xihui Liu

T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Kaiyue Sun, Rongyao Fang, Chengqi Duan, Xian Liu, Aoxue Li, Xihui Liu

Abstract

Text-to-image (T2I) generative models have achieved remarkable progress, demonstrating exceptional capability in synthesizing high-quality images from textual prompts. While existing research and benchmarks have extensively evaluated the ability of T2I models to follow the literal meaning of prompts, their ability to reason over prompts with domain knowledge to uncover implicit meaning and contextual nuances remains underexplored. To bridge this gap, we introduce T2I-ReasonBench, a novel benchmark designed to explore the knowledge-driven reasoning capabilities of T2I models.T2I-ReasonBench comprises 800 meticulously designed prompts organized into four dimensions: (1) Idiom Interpretation, (2) Textual Image Design, (3) Entity Reasoning, and (4) Scientific Reasoning. These dimensions challenge models to integrate domain knowledge, infer implicit meaning, and resolve contextual ambiguities. To quantify the performance, we introduce a two-stage evaluation framework: a large language model (LLM) generates prompt-specific question-criterion pairs that evaluate if the image includes the essential elements resulting from correct reasoning; a multimodal LLM (MLLM) then scores the generated image against these criteria. Our comprehensive study across 16 state-of-the-art diffusion and unified multimodal models (UMMs) reveal two primary bottlenecks. First, many models lack the foundational reasoning ability to fully comprehend complex prompts. Second, even models with stronger reasoning modules exhibit a persistent gap between their internal understanding and the final generated image. This highlights an urgent need for the next generation of T2I systems to not only improve their reasoning capability but also to enhance integration between reasoning and synthesis.

Anthology ID:: 2026.findings-acl.433
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8919–8944
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.433/
DOI:
Bibkey:
Cite (ACL):: Kaiyue Sun, Rongyao Fang, Chengqi Duan, Xian Liu, Aoxue Li, and Xihui Liu. 2026. T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 8919–8944, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation (Sun et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.433.pdf
Checklist:: 2026.findings-acl.433.checklist.pdf

PDF Cite Search Checklist Fix data