Sheffield’s Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

Edward Gow-Smith, Danae Sánchez Villegas


Abstract
The University of Sheffield took part in the shared task 2023 AmericasNLP for all eleven language pairs. Our models consist of training different variations of NLLB-200 model on data provided by the organizers and available data from various sources such as constitutions, handbooks and news articles. Our models outperform the baseline model on the development set on chrF with substantial improvements particularly for Aymara, Guarani and Quechua. On the test set, our best submission achieves the highest average chrF of all the submissions, we rank first in four of the eleven languages, and at least one of our models ranks in the top 3 for all languages.
Anthology ID:
2023.americasnlp-1.21
Volume:
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
192–199
Language:
URL:
https://aclanthology.org/2023.americasnlp-1.21
DOI:
10.18653/v1/2023.americasnlp-1.21
Bibkey:
Cite (ACL):
Edward Gow-Smith and Danae Sánchez Villegas. 2023. Sheffield’s Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 192–199, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Sheffield’s Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages (Gow-Smith & Sánchez Villegas, AmericasNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.americasnlp-1.21.pdf