NRC-CNRC Machine Translation Systems for the 2021 AmericasNLP Shared Task

Rebecca Knowles, Darlene Stewart, Samuel Larkin, Patrick Littell


Abstract
We describe the NRC-CNRC systems submitted to the AmericasNLP shared task on machine translation. We submitted systems translating from Spanish into Wixárika, Nahuatl, Rarámuri, and Guaraní. Our best neural machine translation systems used multilingual pretraining, ensembling, finetuning, training on parts of the development data, and subword regularization. We also submitted translation memory systems as a strong baseline.
Anthology ID:
2021.americasnlp-1.25
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Editors:
Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
224–233
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.25
DOI:
10.18653/v1/2021.americasnlp-1.25
Bibkey:
Cite (ACL):
Rebecca Knowles, Darlene Stewart, Samuel Larkin, and Patrick Littell. 2021. NRC-CNRC Machine Translation Systems for the 2021 AmericasNLP Shared Task. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 224–233, Online. Association for Computational Linguistics.
Cite (Informal):
NRC-CNRC Machine Translation Systems for the 2021 AmericasNLP Shared Task (Knowles et al., AmericasNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2021.americasnlp-1.25.pdf