Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task

Nouman Ahmed; Natalia Flechas Manrique; Antonije Petrović

doi:10.18653/v1/2023.americasnlp-1.16

Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task

Nouman Ahmed, Natalia Flechas Manrique, Antonije Petrović

Abstract

We present the LCT-EHU submission to the AmericasNLP 2023 low-resource machine translation shared task. We focus on the Spanish-Quechua language pair and explore the usage of different approaches: (1) Obtain new parallel corpora from the literature and legal domains, (2) Compare a high-resource Spanish-English pre-trained MT model with a Spanish-Finnish pre-trained model (with Finnish being chosen as a target language due to its morphological similarity to Quechua), and (3) Explore additional techniques such as copied corpus and back-translation. Overall, we show that the Spanish-Finnish pre-trained model outperforms other setups, while low-quality synthetic data reduces the performance.

Anthology ID:: 2023.americasnlp-1.16
Volume:: Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:: AmericasNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 156–162
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2023.americasnlp-1.16/
DOI:: 10.18653/v1/2023.americasnlp-1.16
Bibkey:
Cite (ACL):: Nouman Ahmed, Natalia Flechas Manrique, and Antonije Petrović. 2023. Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 156–162, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Enhancing Spanish-Quechua Machine Translation with Pre-Trained Models and Diverse Data Sources: LCT-EHU at AmericasNLP Shared Task (Ahmed et al., AmericasNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2023.americasnlp-1.16.pdf

PDF Cite Search Fix data