ComicVQA: A Benchmark for Visual Reasoning in Multimodal LLMs
Esther Gan, Hannah Brown, David Herel, Kenji Kawaguchi, Min-Yen Kan, Michael Qizhe Shieh
Abstract
We introduce Comic Visual Question Answering (ComicVQA), a comics-based benchmark for evaluating MLLMs on visual reasoning. ComicVQA comprises of (i) Missing Panel Prediction, testing fine-grained visual grounding and (ii) Panel Sorting, which evaluates sequential narrative understanding. Proprietary models achieve up to 62.6% on Missing Panel Prediction and 46.4% on Panel Sorting, whereas open-source models reach only 47.7% and 26.9%, respectively. In contrast, human annotators achieve over 83% accuracy on both tasks, revealing a large gap between current models and human-level multimodal understanding in comics. Through controlled ordering ablations and a detailed error taxonomy, we show that current MLLMs rely primarily on coarse temporal cues and struggle with fine-grained visual reasoning. These findings demonstrate ComicVQA as a diagnostic benchmark for advancing multimodal visual reasoning in comics.- Anthology ID:
- 2026.findings-acl.1268
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 25347–25370
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1268/
- DOI:
- Cite (ACL):
- Esther Gan, Hannah Brown, David Herel, Kenji Kawaguchi, Min-Yen Kan, and Michael Qizhe Shieh. 2026. ComicVQA: A Benchmark for Visual Reasoning in Multimodal LLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 25347–25370, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- ComicVQA: A Benchmark for Visual Reasoning in Multimodal LLMs (Gan et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1268.pdf