Abstract
Machine translation for low-resource languages, such as Guarani, is a challenging task due to the lack of data. One way of tackling it is using pretrained word embeddings for model initialization. In this work we try to check if currently available data is enough to train rich embeddings for enhancing MT for Guarani and Spanish, by building a set of word embedding collections and training MT systems using them. We found that the trained vectors are strong enough to slightly improve the performance of some of the translation models and also to speed up the training convergence.- Anthology ID:
- 2022.computel-1.16
- Volume:
- Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Sarah Moeller, Antonios Anastasopoulos, Antti Arppe, Aditi Chaudhary, Atticus Harrigan, Josh Holden, Jordan Lachler, Alexis Palmer, Shruti Rijhwani, Lane Schwartz
- Venue:
- ComputEL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 127–132
- Language:
- URL:
- https://aclanthology.org/2022.computel-1.16
- DOI:
- 10.18653/v1/2022.computel-1.16
- Cite (ACL):
- Santiago Góngora, Nicolás Giossa, and Luis Chiruzzo. 2022. Can We Use Word Embeddings for Enhancing Guarani-Spanish Machine Translation?. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 127–132, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Can We Use Word Embeddings for Enhancing Guarani-Spanish Machine Translation? (Góngora et al., ComputEL 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.computel-1.16.pdf
- Code
- sgongora27/Guarani-embeddings-for-MT