Leveraging Large Language Models for Spanish-Indigenous Language Machine Translation at AmericasNLP 2025

Mahshar Yahan, Dr. Mohammad Islam


Abstract
This paper presents our approach to machine translation between Spanish and 13 Indigenous languages of the Americas as part of the AmericasNLP 2025 shared task. Addressing the challenges of low-resource translation, we fine-tuned advanced multilingual models, including NLLB-200 (Distilled-600M), Llama 3.1 (8B-Instruct) and XGLM 1.7B, using techniques such as dynamic batching, token adjustments, and embedding initialization. Data preprocessing steps like punctuation removal and tokenization refinements were employed to achieve data generalization. While our models demonstrated strong performance for Awajun and Quechua translations, they struggled with morphologically complex languages like Nahuatl and Otomí. Our approach achieved competitive ChrF++ scores for Awajun (35.16) and Quechua (31.01) in the Spanish-to-Indigenous translation track (Es→Xx). Similarly, in the Indigenous-to-Spanish track (Xx→Es), we obtained ChrF++ scores of 33.70 for Awajun and 31.71 for Quechua. These results underscore the potential of tailored methodologies in preserving linguistic diversity while advancing machine translation for endangered languages.
Anthology ID:
2025.americasnlp-1.15
Volume:
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Manuel Mager, Abteen Ebrahimi, Robert Pugh, Shruti Rijhwani, Katharina Von Der Wense, Luis Chiruzzo, Rolando Coto-Solano, Arturo Oncevay
Venues:
AmericasNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
126–133
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.americasnlp-1.15/
DOI:
10.18653/v1/2025.americasnlp-1.15
Bibkey:
Cite (ACL):
Mahshar Yahan and Dr. Mohammad Islam. 2025. Leveraging Large Language Models for Spanish-Indigenous Language Machine Translation at AmericasNLP 2025. In Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP), pages 126–133, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Leveraging Large Language Models for Spanish-Indigenous Language Machine Translation at AmericasNLP 2025 (Yahan & Islam, AmericasNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.americasnlp-1.15.pdf