@inproceedings{zhao-etal-2020-double,
    title = "Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions",
    author = "Zhao, Yuting  and
      Komachi, Mamoru  and
      Kajiwara, Tomoyuki  and
      Chu, Chenhui",
    editor = "Martins, Andr{\'e}  and
      Moniz, Helena  and
      Fumega, Sara  and
      Martins, Bruno  and
      Batista, Fernando  and
      Coheur, Luisa  and
      Parra, Carla  and
      Trancoso, Isabel  and
      Turchi, Marco  and
      Bisazza, Arianna  and
      Moorkens, Joss  and
      Guerberof, Ana  and
      Nurminen, Mary  and
      Marg, Lena  and
      Forcada, Mikel L.",
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://preview.aclanthology.org/ingest-emnlp/2020.eamt-1.12/",
    pages = "105--114",
    abstract = "Existing studies on multimodal neural machine translation (MNMT) have mainly focused on the effect of combining visual and textual modalities to improve translations. However, it has been suggested that the visual modality is only marginally beneficial. Conventional visual attention mechanisms have been used to select the visual features from equally-sized grids generated by convolutional neural networks (CNNs), and may have had modest effects on aligning the visual concepts associated with textual objects, because the grid visual features do not capture semantic information. In contrast, we propose the application of semantic image regions for MNMT by integrating visual and textual features using two individual attention mechanisms (double attention). We conducted experiments on the Multi30k dataset and achieved an improvement of 0.5 and 0.9 BLEU points for English-German and English-French translation tasks, compared with the MNMT with grid visual features. We also demonstrated concrete improvements on translation performance benefited from semantic image regions."
}Markdown (Informal)
[Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions](https://preview.aclanthology.org/ingest-emnlp/2020.eamt-1.12/) (Zhao et al., EAMT 2020)
ACL