ANVITA Machine Translation System for WAT 2021 MultiIndicMT Shared Task

Pavanpankaj Vegi, Sivabhavani J, Biswajit Paul, Chitra Viswanathan, Prasanna Kumar K R


Abstract
This paper describes ANVITA-1.0 MT system, architected for submission to WAT2021 MultiIndicMT shared task by mcairt team, where the team participated in 20 translation directions: English→Indic and Indic→English; Indic set comprised of 10 Indian languages. ANVITA-1.0 MT system comprised of two multi-lingual NMT models one for the English→Indic directions and other for the Indic→English directions with shared encoder-decoder, catering 10 language pairs and twenty translation directions. The base models were built based on Transformer architecture and trained over MultiIndicMT WAT 2021 corpora and further employed back translation and transliteration for selective data augmentation, and model ensemble for better generalization. Additionally, MultiIndicMT WAT 2021 corpora was distilled using a series of filtering operations before putting up for training. ANVITA-1.0 achieved highest AM-FM score for English→Bengali, 2nd for English→Tamil and 3rd for English→Hindi, Bengali→English directions on official test set. In general, performance achieved by ANVITA for the Indic→English directions are relatively better than that of English→Indic directions for all the 10 language pairs when evaluated using BLEU and RIBES, although the same trend is not observed consistently when AM-FM based evaluation was carried out. As compared to BLEU, RIBES and AM-FM based scoring placed ANVITA relatively better among all the task participants.
Anthology ID:
2021.wat-1.30
Volume:
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
Venue:
WAT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
244–249
Language:
URL:
https://aclanthology.org/2021.wat-1.30
DOI:
10.18653/v1/2021.wat-1.30
Bibkey:
Cite (ACL):
Pavanpankaj Vegi, Sivabhavani J, Biswajit Paul, Chitra Viswanathan, and Prasanna Kumar K R. 2021. ANVITA Machine Translation System for WAT 2021 MultiIndicMT Shared Task. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 244–249, Online. Association for Computational Linguistics.
Cite (Informal):
ANVITA Machine Translation System for WAT 2021 MultiIndicMT Shared Task (Vegi et al., WAT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2021.wat-1.30.pdf