Multimodal Neural Machine Translation: A Survey of the State of the Art

Yi Feng, Chuanyi Li, Jiatong He, Zhenyu Hou, Vincent Ng


Abstract
Multimodal neural machine translation (MNMT) has received increasing attention due to its widespread applications in various fields such as cross-border e-commerce and cross-border social media platforms. The task aims to integrate other modalities, such as the visual modality, with textual data to enhance translation performance. We survey the major milestones in MNMT research, providing a comprehensive overview of relevant datasets and recent methodologies, and discussing key challenges and promising research directions.
Anthology ID:
2025.emnlp-main.1125
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22141–22158
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1125/
DOI:
Bibkey:
Cite (ACL):
Yi Feng, Chuanyi Li, Jiatong He, Zhenyu Hou, and Vincent Ng. 2025. Multimodal Neural Machine Translation: A Survey of the State of the Art. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 22141–22158, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Multimodal Neural Machine Translation: A Survey of the State of the Art (Feng et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1125.pdf
Checklist:
 2025.emnlp-main.1125.checklist.pdf