NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs

Kenji Imamura, Eiichiro Sumita


Abstract
In this paper, we present the NICT system (NICT-2) submitted to the NICT-SAP shared task at the 8th Workshop on Asian Translation (WAT-2021). A feature of our system is that we used a pretrained multilingual BART (Bidirectional and Auto-Regressive Transformer; mBART) model. Because publicly available models do not support some languages in the NICT-SAP task, we added these languages to the mBART model and then trained it using monolingual corpora extracted from Wikipedia. We fine-tuned the expanded mBART model using the parallel corpora specified by the NICT-SAP task. The BLEU scores greatly improved in comparison with those of systems without the pretrained model, including the additional languages.
Anthology ID:
2021.wat-1.8
Volume:
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Month:
August
Year:
2021
Address:
Online
Venue:
WAT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
90–95
Language:
URL:
https://aclanthology.org/2021.wat-1.8
DOI:
10.18653/v1/2021.wat-1.8
Bibkey:
Cite (ACL):
Kenji Imamura and Eiichiro Sumita. 2021. NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 90–95, Online. Association for Computational Linguistics.
Cite (Informal):
NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs (Imamura & Sumita, WAT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2021.wat-1.8.pdf