Nathaniel Krasner
2025
Machine Translation Using Grammar Materials for LLM Post-Correction
Jonathan Hus
|
Antonios Anastasopoulos
|
Nathaniel Krasner
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
This paper describes George Mason University’s submission to the AmericasNLP 2025 Shared Task on Machine Translation into Indigenous Languages. We prompt a large language model (LLM) with grammar reference materials to correct the translations produced by a finetuned Encoder-Decoder machine translation system. This system leads to improvements when translating from the indigenous languages into Spanish indicating that LLMs are capable of using grammar materials to decipher an unseen language.
Machine Translation Metrics for Indigenous Languages Using Fine-tuned Semantic Embeddings
Nathaniel Krasner
|
Justin Vasselli
|
Belu Ticona
|
Antonios Anastasopoulos
|
Chi-Kiu Lo
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
This paper describes the Tekio submission to the AmericasNLP 2025 shared task on machine translation metrics for Indigenous languages. We developed two primary metric approaches leveraging multilingual semantic embeddings. First, we fine-tuned the Language-agnostic BERT Sentence Encoder (LaBSE) specifically for Guarani, Bribri, and Nahuatl, significantly enhancing semantic representation quality. Next, we integrated our fine-tuned LaBSE into the semantic similarity metric YiSi-1, exploring the effectiveness of averaging multiple layers. Additionally, we trained regression-based COMET metrics (COMET-DA) using the fine-tuned LaBSE embeddings as a semantic backbone, comparing Mean Absolute Error (MAE) and Mean Squared Error (MSE) loss functions. Our YiSi-1 metric using layer-averaged embeddings chosen by having the best performance on the development set for each individual language achieved the highest average correlation across languages among our submitted systems, and our COMET models demonstrated competitive performance for Guarani.
2022
Revisiting the Effects of Leakage on Dependency Parsing
Nathaniel Krasner
|
Miriam Wanner
|
Antonios Anastasopoulos
Findings of the Association for Computational Linguistics: ACL 2022
Recent work by Søgaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this claim, testing it on more models and languages. We find that it only holds for zero-shot cross-lingual settings. We then propose a more fine-grained measure of such leakage which, unlike the original measure, not only explains but also correlates with observed performance variation. Code and data are available here: https://github.com/miriamwanner/reu-nlp-project
Search
Fix data
Co-authors
- Antonios Anastasopoulos 3
- Jonathan Hus 1
- Chi-Kiu Lo 1
- Belu Ticona 1
- Justin Vasselli 1
- show all...