Low Resource Multimodal Neural Machine Translation of English-Hindi in News Domain
Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay
Abstract
Incorporating multiple input modalities in a machine translation (MT) system is gaining popularity among MT researchers. Unlike the publicly available dataset for Multimodal Machine Translation (MMT) tasks, where the captions are short image descriptions, the news captions provide a more detailed description of the contents of the images. As a result, numerous named entities relating to specific persons, locations, etc., are found. In this paper, we acquire two monolingual news datasets reported in English and Hindi paired with the images to generate a synthetic English-Hindi parallel corpus. The parallel corpus is used to train the English-Hindi Neural Machine Translation (NMT) and an English-Hindi MMT system by incorporating the image feature paired with the corresponding parallel corpus. We also conduct a systematic analysis to evaluate the English-Hindi MT systems with 1) more synthetic data and 2) by adding back-translated data. Our finding shows improvement in terms of BLEU scores for both the NMT (+8.05) and MMT (+11.03) systems.- Anthology ID:
- 2021.mmtlrl-1.4
- Volume:
- Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Online (Virtual Mode)
- Venue:
- MMTLRL
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 20–29
- Language:
- URL:
- https://aclanthology.org/2021.mmtlrl-1.4
- DOI:
- Cite (ACL):
- Loitongbam Sanayai Meetei, Thoudam Doren Singh, and Sivaji Bandyopadhyay. 2021. Low Resource Multimodal Neural Machine Translation of English-Hindi in News Domain. In Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021), pages 20–29, Online (Virtual Mode). INCOMA Ltd..
- Cite (Informal):
- Low Resource Multimodal Neural Machine Translation of English-Hindi in News Domain (Sanayai Meetei et al., MMTLRL 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.mmtlrl-1.4.pdf