The REPU CSSpanish–Quechua Submission to the AmericasNLP 2021 Shared Task on Open Machine Translation

Oscar Moreno


Abstract
We present the submission of REPUcs to the AmericasNLP machine translation shared task for the low resource language pair Spanish–Quechua. Our neural machine translation system ranked first in Track two (development set not used for training) and third in Track one (training includes development data). Our contribution is focused on: (i) the collection of new parallel data from different web sources (poems, lyrics, lexicons, handbooks), and (ii) using large Spanish–English data for pre-training and then fine-tuning the Spanish–Quechua system. This paper describes the new parallel corpora and our approach in detail.
Anthology ID:
2021.americasnlp-1.27
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
241–247
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.27
DOI:
10.18653/v1/2021.americasnlp-1.27
Bibkey:
Cite (ACL):
Oscar Moreno. 2021. The REPU CS’ Spanish–Quechua Submission to the AmericasNLP 2021 Shared Task on Open Machine Translation. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 241–247, Online. Association for Computational Linguistics.
Cite (Informal):
The REPU CS’ Spanish–Quechua Submission to the AmericasNLP 2021 Shared Task on Open Machine Translation (Moreno, AmericasNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.americasnlp-1.27.pdf