Nicolás Giossa


2022

pdf
Can We Use Word Embeddings for Enhancing Guarani-Spanish Machine Translation?
Santiago Góngora | Nicolás Giossa | Luis Chiruzzo
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages

Machine translation for low-resource languages, such as Guarani, is a challenging task due to the lack of data. One way of tackling it is using pretrained word embeddings for model initialization. In this work we try to check if currently available data is enough to train rich embeddings for enhancing MT for Guarani and Spanish, by building a set of word embedding collections and training MT systems using them. We found that the trained vectors are strong enough to slightly improve the performance of some of the translation models and also to speed up the training convergence.

2021

pdf
Experiments on a Guarani Corpus of News and Social Media
Santiago Góngora | Nicolás Giossa | Luis Chiruzzo
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

While Guarani is widely spoken in South America, obtaining a large amount of Guarani text from the web is hard. We present the building process of a Guarani corpus composed of a parallel Guarani-Spanish set of news articles, and a monolingual set of tweets. We perform some word embeddings experiments aiming at evaluating the quality of the Guarani split of the corpus, finding encouraging results but noticing that more diversity in text domains might be needed for further improvements.