A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation
Àlex R. Atrio, Alexis Allemann, Ljiljana Dolamic, Andrei Popescu-Belis
Abstract
Training neural MT systems for low-resource language pairs or in unsupervised settings (i.e. with no parallel data) often involves a large number of auxiliary systems. These may include parent systems trained on higher-resource pairs and used for initializing the parameters of child systems, multilingual systems for neighboring languages, and several stages of systems trained on pseudo-parallel data obtained through back-translation. We propose here a simplified pipeline, which we compare to the best submissions to the WMT 2021 Shared Task on Unsupervised MT and Very Low Resource Supervised MT. Our pipeline only needs two parents, two children, one round of back-translation for low-resource directions and two for unsupervised ones and obtains better or similar scores when compared to more complex alternatives.- Anthology ID:
- 2023.loresmt-1.4
- Volume:
- Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jade Abbott, Jonathan Washington, Nathaniel Oco, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
- Venue:
- LoResMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 47–58
- Language:
- URL:
- https://aclanthology.org/2023.loresmt-1.4
- DOI:
- 10.18653/v1/2023.loresmt-1.4
- Cite (ACL):
- Àlex R. Atrio, Alexis Allemann, Ljiljana Dolamic, and Andrei Popescu-Belis. 2023. A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation. In Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023), pages 47–58, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation (Atrio et al., LoResMT 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2023.loresmt-1.4.pdf