NICT-5’s Submission To WAT 2021: MBART Pre-training And In-Domain Fine Tuning For Indic Languages

Raj Dabre, Abhisek Chakrabarty


Abstract
In this paper we describe our submission to the multilingual Indic language translation wtask “MultiIndicMT” under the team name “NICT-5”. This task involves translation from 10 Indic languages into English and vice-versa. The objective of the task was to explore the utility of multilingual approaches using a variety of in-domain and out-of-domain parallel and monolingual corpora. Given the recent success of multilingual NMT pre-training we decided to explore pre-training an MBART model on a large monolingual corpus collection covering all languages in this task followed by multilingual fine-tuning on small in-domain corpora. Firstly, we observed that a small amount of pre-training followed by fine-tuning on small bilingual corpora can yield large gains over when pre-training is not used. Furthermore, multilingual fine-tuning leads to further gains in translation quality which significantly outperforms a very strong multilingual baseline that does not rely on any pre-training.
Anthology ID:
2021.wat-1.23
Volume:
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Month:
August
Year:
2021
Address:
Online
Venue:
WAT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
198–204
Language:
URL:
https://aclanthology.org/2021.wat-1.23
DOI:
10.18653/v1/2021.wat-1.23
Bibkey:
Cite (ACL):
Raj Dabre and Abhisek Chakrabarty. 2021. NICT-5’s Submission To WAT 2021: MBART Pre-training And In-Domain Fine Tuning For Indic Languages. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 198–204, Online. Association for Computational Linguistics.
Cite (Informal):
NICT-5’s Submission To WAT 2021: MBART Pre-training And In-Domain Fine Tuning For Indic Languages (Dabre & Chakrabarty, WAT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.wat-1.23.pdf