Machine Translation Metrics for Indigenous Languages Using Fine-tuned Semantic Embeddings
Nathaniel Krasner, Justin Vasselli, Belu Ticona, Antonios Anastasopoulos, Chi-Kiu Lo
Abstract
This paper describes the Tekio submission to the AmericasNLP 2025 shared task on machine translation metrics for Indigenous languages. We developed two primary metric approaches leveraging multilingual semantic embeddings. First, we fine-tuned the Language-agnostic BERT Sentence Encoder (LaBSE) specifically for Guarani, Bribri, and Nahuatl, significantly enhancing semantic representation quality. Next, we integrated our fine-tuned LaBSE into the semantic similarity metric YiSi-1, exploring the effectiveness of averaging multiple layers. Additionally, we trained regression-based COMET metrics (COMET-DA) using the fine-tuned LaBSE embeddings as a semantic backbone, comparing Mean Absolute Error (MAE) and Mean Squared Error (MSE) loss functions. Our YiSi-1 metric using layer-averaged embeddings chosen by having the best performance on the development set for each individual language achieved the highest average correlation across languages among our submitted systems, and our COMET models demonstrated competitive performance for Guarani.- Anthology ID:
- 2025.americasnlp-1.11
- Volume:
- Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
- Month:
- May
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Manuel Mager, Abteen Ebrahimi, Robert Pugh, Shruti Rijhwani, Katharina Von Der Wense, Luis Chiruzzo, Rolando Coto-Solano, Arturo Oncevay
- Venues:
- AmericasNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 100–104
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.americasnlp-1.11/
- DOI:
- Cite (ACL):
- Nathaniel Krasner, Justin Vasselli, Belu Ticona, Antonios Anastasopoulos, and Chi-Kiu Lo. 2025. Machine Translation Metrics for Indigenous Languages Using Fine-tuned Semantic Embeddings. In Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP), pages 100–104, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Machine Translation Metrics for Indigenous Languages Using Fine-tuned Semantic Embeddings (Krasner et al., AmericasNLP 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.americasnlp-1.11.pdf