Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation

Tosho Hirasawa; Emanuele Bugliarello; Desmond Elliott; Mamoru Komachi

doi:10.18653/v1/2023.wmt-1.47

Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation

Tosho Hirasawa, Emanuele Bugliarello, Desmond Elliott, Mamoru Komachi

Abstract

Multimodal machine translation (MMT) systems have been successfully developed in recent years for a few language pairs. However, training such models usually requires tuples of a source language text, target language text, and images. Obtaining these data involves expensive human annotations, making it difficult to develop models for unseen text-only language pairs. In this work, we propose the task of zero-shot cross-modal machine translation aiming to transfer multimodal knowledge from an existing multimodal parallel corpus into a new translation direction. We also introduce a novel MMT model with a visual prediction network to learn visual features grounded on multimodal parallel data and provide pseudo-features for text-only language pairs. With this training paradigm, our MMT model outperforms its text-only counterpart. In our extensive analyses, we show that (i) the selection of visual features is important, and (ii) training on image-aware translations and being grounded on a similar language pair are mandatory.

Anthology ID:: 2023.wmt-1.47
Volume:: Proceedings of the Eighth Conference on Machine Translation
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 522–535
Language:
URL:: https://aclanthology.org/2023.wmt-1.47
DOI:: 10.18653/v1/2023.wmt-1.47
Bibkey:
Cite (ACL):: Tosho Hirasawa, Emanuele Bugliarello, Desmond Elliott, and Mamoru Komachi. 2023. Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. In Proceedings of the Eighth Conference on Machine Translation, pages 522–535, Singapore. Association for Computational Linguistics.
Cite (Informal):: Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation (Hirasawa et al., WMT 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl-24-ws-corrections/2023.wmt-1.47.pdf

PDF Search