Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data

Koel Dutta Chowdhury; Mohammed Hasanuzzaman; Qun Liu

doi:10.18653/v1/W18-3405

Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data

Koel Dutta Chowdhury, Mohammed Hasanuzzaman, Qun Liu

Abstract

In this paper, we investigate the effectiveness of training a multimodal neural machine translation (MNMT) system with image features for a low-resource language pair, Hindi and English, using synthetic data. A three-way parallel corpus which contains bilingual texts and corresponding images is required to train a MNMT system with image features. However, such a corpus is not available for low resource language pairs. To address this, we developed both a synthetic training dataset and a manually curated development/test dataset for Hindi based on an existing English-image parallel corpus. We used these datasets to build our image description translation system by adopting state-of-the-art MNMT models. Our results show that it is possible to train a MNMT system for low-resource language pairs through the use of synthetic data and that such a system can benefit from image features.

Anthology ID:: W18-3405
Volume:: Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Month:: July
Year:: 2018
Address:: Melbourne
Editors:: Reza Haffari, Colin Cherry, George Foster, Shahram Khadivi, Bahar Salehi
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33–42
Language:
URL:: https://preview.aclanthology.org/iwcs-25-ingestion/W18-3405/
DOI:: 10.18653/v1/W18-3405
Bibkey:
Cite (ACL):: Koel Dutta Chowdhury, Mohammed Hasanuzzaman, and Qun Liu. 2018. Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, pages 33–42, Melbourne. Association for Computational Linguistics.
Cite (Informal):: Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data (Dutta Chowdhury et al., ACL 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/iwcs-25-ingestion/W18-3405.pdf

PDF Cite Search Fix data