Towards Multimodal Simultaneous Neural Machine Translation

Aizhan Imankulova, Masahiro Kaneko, Tosho Hirasawa, Mamoru Komachi


Abstract
Simultaneous translation involves translating a sentence before the speaker’s utterance is completed in order to realize real-time understanding in multiple languages. This task is significantly more challenging than the general full sentence translation because of the shortage of input information during decoding. To alleviate this shortage, we propose multimodal simultaneous neural machine translation (MSNMT), which leverages visual information as an additional modality. Our experiments with the Multi30k dataset showed that MSNMT significantly outperforms its text-only counterpart in more timely translation situations with low latency. Furthermore, we verified the importance of visual information during decoding by performing an adversarial evaluation of MSNMT, where we studied how models behaved with incongruent input modality and analyzed the effect of different word order between source and target languages.
Anthology ID:
2020.wmt-1.70
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
594–603
Language:
URL:
https://aclanthology.org/2020.wmt-1.70
DOI:
Bibkey:
Cite (ACL):
Aizhan Imankulova, Masahiro Kaneko, Tosho Hirasawa, and Mamoru Komachi. 2020. Towards Multimodal Simultaneous Neural Machine Translation. In Proceedings of the Fifth Conference on Machine Translation, pages 594–603, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Multimodal Simultaneous Neural Machine Translation (Imankulova et al., WMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.wmt-1.70.pdf
Video:
 https://slideslive.com/38939559
Code
 toshohirasawa/mst