Mining Contextualized Visual Associations from Images for Creativity Understanding

Ananya Sahu; Amith Ananthram; Kathleen McKeown

Mining Contextualized Visual Associations from Images for Creativity Understanding

Ananya Sahu, Amith Ananthram, Kathleen McKeown

Abstract

Understanding another person’s creative output requires a shared language of association. However, when training vision-language models such as CLIP, we rely on web-scraped datasets containing short, predominantly literal, alt-text. In this work, we introduce a method for mining contextualized associations for salient visual elements in an image that can scale to any unlabeled dataset. Given an image, we can use these mined associations to generate high quality creative captions at increasing degrees of abstraction. With our method, we produce a new dataset of visual associations and 1.7m creative captions for the images in MSCOCO. Human evaluation confirms that these captions remain visually grounded while exhibiting recognizably increasing abstraction. Moreover, fine-tuning a visual encoder on this dataset yields meaningful improvements in zero-shot image-text retrieval in two creative domains: poetry and metaphor visualization. We release our dataset, our generation code and our models for use by the broader community.

Anthology ID:: 2025.inlg-main.11
Volume:: Proceedings of the 18th International Natural Language Generation Conference
Month:: October
Year:: 2025
Address:: Hanoi, Vietnam
Editors:: Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 165–181
Language:
URL:: https://preview.aclanthology.org/author-page-you-zhang-rochester/2025.inlg-main.11/
DOI:
Bibkey:
Cite (ACL):: Ananya Sahu, Amith Ananthram, and Kathleen McKeown. 2025. Mining Contextualized Visual Associations from Images for Creativity Understanding. In Proceedings of the 18th International Natural Language Generation Conference, pages 165–181, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):: Mining Contextualized Visual Associations from Images for Creativity Understanding (Sahu et al., INLG 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-you-zhang-rochester/2025.inlg-main.11.pdf

PDF Cite Search Fix data