Imagination and Contemplation: A Balanced Framework for Semantic-Augmented Multimodal Machine Translation

Zhuang Yu, Shiliang Sun, Jing Zhao, Tengfei Song, Hao Yang


Abstract
Multimodal Machine Translation (MMT) enhances textual translation through auxiliary inputs such as images, which is particularly effective in resolving linguistic ambiguities. However, visual information often introduces redundancy or noise, potentially impairing translation quality. To address this challenge, we propose a balanced semantic-augmented framework that integrates “Imagination“ and “Contemplation“ in multimodal understanding. Specifically, we first generate synthetic images from the source text and align them with the authentic images via an optimal transport (OT) loss to enhance visual-semantic consistency. A CLIP-based similarity gating mechanism is introduced to adaptively fuse visual features from both authentic and synthetic images during visual representation learning. To strengthen semantic grounding, a neural machine translation (NMT) branch is incorporated as a regularization signal, and a Kullback-Leibler (KL) divergence is applied between MMT and NMT outputs to mitigate modality mismatch. Furthermore, an image-text contrastive (ITC) loss aligns the final translations with image representations, reinforcing multimodal coherence. Experiments on multiple translation datasets with a diverse set of language pairs demonstrate that our framework outperforms existing baselines, particularly in cases with visually ambiguous or weakly correlated content.
Anthology ID:
2025.findings-emnlp.579
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10913–10928
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.579/
DOI:
10.18653/v1/2025.findings-emnlp.579
Bibkey:
Cite (ACL):
Zhuang Yu, Shiliang Sun, Jing Zhao, Tengfei Song, and Hao Yang. 2025. Imagination and Contemplation: A Balanced Framework for Semantic-Augmented Multimodal Machine Translation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10913–10928, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Imagination and Contemplation: A Balanced Framework for Semantic-Augmented Multimodal Machine Translation (Yu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.579.pdf
Checklist:
 2025.findings-emnlp.579.checklist.pdf