IITP-MT at WAT2021: Indic-English Multilingual Neural Machine Translation using Romanized Vocabulary
Ramakrishna Appicharla, Kamal Kumar Gupta, Asif Ekbal, Pushpak Bhattacharyya
Abstract
This paper describes the systems submitted to WAT 2021 MultiIndicMT shared task by IITP-MT team. We submit two multilingual Neural Machine Translation (NMT) systems (Indic-to-English and English-to-Indic). We romanize all Indic data and create subword vocabulary which is shared between all Indic languages. We use back-translation approach to generate synthetic data which is appended to parallel corpus and used to train our models. The models are evaluated using BLEU, RIBES and AMFM scores with Indic-to-English model achieving 40.08 BLEU for Hindi-English pair and English-to-Indic model achieving 34.48 BLEU for English-Hindi pair. However, we observe that the shared romanized subword vocabulary is not helping English-to-Indic model at the time of generation, leading it to produce poor quality translations for Tamil, Telugu and Malayalam to English pairs with BLEU score of 8.51, 6.25 and 3.79 respectively.- Anthology ID:
- 2021.wat-1.29
- Volume:
- Proceedings of the 8th Workshop on Asian Translation (WAT2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
- Venue:
- WAT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 238–243
- Language:
- URL:
- https://aclanthology.org/2021.wat-1.29
- DOI:
- 10.18653/v1/2021.wat-1.29
- Cite (ACL):
- Ramakrishna Appicharla, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2021. IITP-MT at WAT2021: Indic-English Multilingual Neural Machine Translation using Romanized Vocabulary. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 238–243, Online. Association for Computational Linguistics.
- Cite (Informal):
- IITP-MT at WAT2021: Indic-English Multilingual Neural Machine Translation using Romanized Vocabulary (Appicharla et al., WAT 2021)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2021.wat-1.29.pdf