A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization

Janak Kapuriya, Ali Hatami, Paul Buitelaar


Abstract
Recent advancements in text-to-image generative models have improved narrative consistency in story visualization. However, current story visualization models often overlook cultural dimensions, resulting in visuals that lack cultural fidelity. In this study, we present a progressive evaluation framework for story visualization. We validate this framework on current text-to-image models across three languages (English, Hindi, and Chinese) on two datasets (VIST and FlintstonesSV). The proposed framework introduces three levels of cultural analysis as evaluation rubrics: 1) Basic Cultural Criteria, 2) Cultural Dimension Guidance, and 3) Cultural Examples Grounding. We evaluate story visualization by use of a novel MLLM-as-Jury approach across all three rubrics and a small-scale human evaluation only on the third rubric. We implement an MLLM-as-jury approach by aggregating scores from three different families of MLLM-as-Judge models. In our experiments, real-world stories generally receive higher cultural appropriateness scores than animated ones, with English tending to score higher than Hindi and Chinese across the evaluated models. Some examples also exhibited culturally inconsistent or stereotypical elements noted by annotators. The proposed progressive evaluation framework has therefore been shown to provide early insights into cultural misalignments in story visualization. Code for this work is made available on https://github.com/janak11111/Cultural_Eval_For_StoryViz
Anthology ID:
2026.gem-main.39
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
410–427
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.39/
DOI:
Bibkey:
Cite (ACL):
Janak Kapuriya, Ali Hatami, and Paul Buitelaar. 2026. A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 410–427, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization (Kapuriya et al., GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.39.pdf