Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Baijun Ji; Tong Zhang; Yicheng Zou; Bojie Hu; Si Shen

doi:10.18653/v1/2022.emnlp-main.453

Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Baijun Ji, Tong Zhang, Yicheng Zou, Bojie Hu, Si Shen

Abstract

Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image. Despite the promising performance, MMT models still suffer the problem of input degradation: models focus more on textual information while visual information is generally overlooked. In this paper, we endeavor to improve MMT performance by increasing visual awareness from an information theoretic perspective. In detail, we decompose the informative visual signals into two parts: source-specific information and target-specific information. We use mutual information to quantify them and propose two methods for objective optimization to better leverage visual signals. Experiments on two datasets demonstrate that our approach can effectively enhance the visual awareness of MMT model and achieve superior results against strong baselines.

Anthology ID:: 2022.emnlp-main.453
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6755–6764
Language:
URL:: https://aclanthology.org/2022.emnlp-main.453
DOI:: 10.18653/v1/2022.emnlp-main.453
Bibkey:
Cite (ACL):: Baijun Ji, Tong Zhang, Yicheng Zou, Bojie Hu, and Si Shen. 2022. Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6755–6764, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective (Ji et al., EMNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2022.emnlp-main.453.pdf

PDF Search