Abstract
Existing studies on multimodal neural machine translation (MNMT) have mainly focused on the effect of combining visual and textual modalities to improve translations. However, it has been suggested that the visual modality is only marginally beneficial. Conventional visual attention mechanisms have been used to select the visual features from equally-sized grids generated by convolutional neural networks (CNNs), and may have had modest effects on aligning the visual concepts associated with textual objects, because the grid visual features do not capture semantic information. In contrast, we propose the application of semantic image regions for MNMT by integrating visual and textual features using two individual attention mechanisms (double attention). We conducted experiments on the Multi30k dataset and achieved an improvement of 0.5 and 0.9 BLEU points for English-German and English-French translation tasks, compared with the MNMT with grid visual features. We also demonstrated concrete improvements on translation performance benefited from semantic image regions.- Anthology ID:
- 2020.eamt-1.12
- Volume:
- Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Lisboa, Portugal
- Editors:
- André Martins, Helena Moniz, Sara Fumega, Bruno Martins, Fernando Batista, Luisa Coheur, Carla Parra, Isabel Trancoso, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof, Mary Nurminen, Lena Marg, Mikel L. Forcada
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 105–114
- Language:
- URL:
- https://aclanthology.org/2020.eamt-1.12
- DOI:
- Cite (ACL):
- Yuting Zhao, Mamoru Komachi, Tomoyuki Kajiwara, and Chenhui Chu. 2020. Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 105–114, Lisboa, Portugal. European Association for Machine Translation.
- Cite (Informal):
- Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions (Zhao et al., EAMT 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.eamt-1.12.pdf
- Data
- Visual Genome