GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation

Jiafeng Xiong, Yuting Zhao


Abstract
Multimodal Machine Translation (MMT) has demonstrated the significant help of visual information in machine translation. However, existing MMT methods face challenges in leveraging the modality gap by enforcing rigid visual-linguistic alignment whilst being confined to inference within their trained multimodal domains. In this work, we construct novel multimodal scene graphs to preserve and integrate modality-specific information and introduce GIIFT, a two-stage Graph-guided Inductive Image-Free MMT framework that uses a cross-modal Graph Attention Network adapter to learn multimodal knowledge in a unified fused space and inductively generalize it to broader image-free translation domains. Experimental results on the Multi30K dataset of English-to-French and English-to-German tasks demonstrate that our GIIFT surpasses existing approaches and achieves the state-of-the-art, even without images during inference. Results on the WMT benchmark show significant improvements over the image-free translation baselines, demonstrating the strength of GIIFT towards inductive image-free inference.
Anthology ID:
2025.wmt-1.6
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
98–112
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.6/
DOI:
Bibkey:
Cite (ACL):
Jiafeng Xiong and Yuting Zhao. 2025. GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation. In Proceedings of the Tenth Conference on Machine Translation, pages 98–112, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation (Xiong & Zhao, WMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.6.pdf