Towards Automatic Evaluation for Image Transcreation

Simran Khanuja; Vivek Iyer; Xiaoyu He; Graham Neubig

Towards Automatic Evaluation for Image Transcreation

Simran Khanuja, Vivek Iyer, Xiaoyu He, Graham Neubig

Abstract

Beyond conventional paradigms of translating speech and text, recently, there has been interest in automated transcreation of images to facilitate localization of visual content across different cultures. Attempts to define this as a formal Machine Learning (ML) problem have been impeded by the lack of automatic evaluation mechanisms, with previous work relying solely on human evaluation. In this paper, we seek to close this gap by proposing a suite of automatic evaluation metrics inspired by machine translation (MT) metrics, categorized into: a) Object-based, b) Embedding-based, and c) VLM-based. Drawing on theories from translation studies and real-world transcreation practices, we identify three critical dimensions of image transcreation: cultural relevance, semantic equivalence and visual similarity, and design our metrics to evaluate systems along these axes. Our results show that proprietary VLMs best identify cultural relevance and semantic equivalence, while vision-encoder representations are adept at measuring visual similarity. Meta-evaluation across 7 countries shows our metrics agree strongly with human ratings, with average segment-level correlations ranging from 0.55-0.87. Finally, through a discussion of the merits and demerits of each metric, we offer a robust framework for automated image transcreation evaluation, grounded in both theoretical foundations and practical application. Our code can be found here: https://github.com/simran-khanuja/automatic-eval-transcreation

Anthology ID:: 2025.naacl-long.359
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7034–7047
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.359/
DOI:
Bibkey:
Cite (ACL):: Simran Khanuja, Vivek Iyer, Xiaoyu He, and Graham Neubig. 2025. Towards Automatic Evaluation for Image Transcreation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7034–7047, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Towards Automatic Evaluation for Image Transcreation (Khanuja et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.359.pdf

PDF Cite Search Fix data