Supervised Visual Attention for Multimodal Neural Machine Translation
Tetsuro Nishihara, Akihiro Tamura, Takashi Ninomiya, Yutaro Omote, Hideki Nakayama
Abstract
This paper proposed a supervised visual attention mechanism for multimodal neural machine translation (MNMT), trained with constraints based on manual alignments between words in a sentence and their corresponding regions of an image. The proposed visual attention mechanism captures the relationship between a word and an image region more precisely than a conventional visual attention mechanism trained through MNMT in an unsupervised manner. Our experiments on English-German and German-English translation tasks using the Multi30k dataset and on English-Japanese and Japanese-English translation tasks using the Flickr30k Entities JP dataset show that a Transformer-based MNMT model can be improved by incorporating our proposed supervised visual attention mechanism and that further improvements can be achieved by combining it with a supervised cross-lingual attention mechanism (up to +1.61 BLEU, +1.7 METEOR).- Anthology ID:
- 2020.coling-main.380
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 4304–4314
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.380
- DOI:
- 10.18653/v1/2020.coling-main.380
- Cite (ACL):
- Tetsuro Nishihara, Akihiro Tamura, Takashi Ninomiya, Yutaro Omote, and Hideki Nakayama. 2020. Supervised Visual Attention for Multimodal Neural Machine Translation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4304–4314, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Supervised Visual Attention for Multimodal Neural Machine Translation (Nishihara et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2020.coling-main.380.pdf
- Data
- Flickr30K Entities, Flickr30k, Multi30K