Transfer Learning with Shallow Decoders: BSC at WMT2021’s Multilingual Low-Resource Translation for Indo-European Languages Shared Task

Ksenia Kharitonova, Ona de Gibert Bonet, Jordi Armengol-Estapé, Mar Rodriguez i Alvarez, Maite Melero


Abstract
This paper describes the participation of the BSC team in the WMT2021’s Multilingual Low-Resource Translation for Indo-European Languages Shared Task. The system aims to solve the Subtask 2: Wikipedia cultural heritage articles, which involves translation in four Romance languages: Catalan, Italian, Occitan and Romanian. The submitted system is a multilingual semi-supervised machine translation model. It is based on a pre-trained language model, namely XLM-RoBERTa, that is later fine-tuned with parallel data obtained mostly from OPUS. Unlike other works, we only use XLM to initialize the encoder and randomly initialize a shallow decoder. The reported results are robust and perform well for all tested languages.
Anthology ID:
2021.wmt-1.43
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Editors:
Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
362–367
Language:
URL:
https://aclanthology.org/2021.wmt-1.43
DOI:
Bibkey:
Cite (ACL):
Ksenia Kharitonova, Ona de Gibert Bonet, Jordi Armengol-Estapé, Mar Rodriguez i Alvarez, and Maite Melero. 2021. Transfer Learning with Shallow Decoders: BSC at WMT2021’s Multilingual Low-Resource Translation for Indo-European Languages Shared Task. In Proceedings of the Sixth Conference on Machine Translation, pages 362–367, Online. Association for Computational Linguistics.
Cite (Informal):
Transfer Learning with Shallow Decoders: BSC at WMT2021’s Multilingual Low-Resource Translation for Indo-European Languages Shared Task (Kharitonova et al., WMT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.wmt-1.43.pdf
Code
 temu-bsc/wmt2021-indoeuropean