ComicScene154: A Scene Dataset for Comic Analysis

Sandro Paval, Pascal Meißner, Ivan P. Yamshchikov


Abstract
Comics offer a compelling yet under-explored domain for computational narrative analysis, combining text and imagery in ways distinct from purely textual or audiovisual media. We introduce ComicScene154, a manually annotated dataset of scene-level narrative arcs derived from public-domain comic books spanning diverse genres. By conceptualizing comics as an abstraction for narrative-driven, multimodal data, we highlight their potential to inform broader research on multi-modal storytelling. To demonstrate the utility of ComicScene154, we present a baseline scene segmentation pipeline, providing an initial benchmark that future studies can build upon. Our results indicate that ComicScene154 constitutes a valuable resource for advancing computational methods in multimodal narrative understanding and expanding the scope of comic analysis within the Natural Language Processing community.
Anthology ID:
2025.emnlp-main.1608
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31562–31568
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1608/
DOI:
Bibkey:
Cite (ACL):
Sandro Paval, Pascal Meißner, and Ivan P. Yamshchikov. 2025. ComicScene154: A Scene Dataset for Comic Analysis. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31562–31568, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
ComicScene154: A Scene Dataset for Comic Analysis (Paval et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1608.pdf
Checklist:
 2025.emnlp-main.1608.checklist.pdf