Oscar Moreno


2025

pdf bib
Findings of the AmericasNLP 2025 Shared Tasks on Machine Translation, Creation of Educational Material, and Translation Metrics for Indigenous Languages of the Americas
Ona De Gibert | Robert Pugh | Ali Marashian | Raul Vazquez | Abteen Ebrahimi | Pavel Denisov | Enora Rice | Edward Gow-Smith | Juan Prieto | Melissa Robles | Rubén Manrique | Oscar Moreno | Angel Lino | Rolando Coto-Solano | Aldo Alvarez | Marvin Agüero-Torales | John E. Ortega | Luis Chiruzzo | Arturo Oncevay | Shruti Rijhwani | Katharina Von Der Wense | Manuel Mager
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

This paper presents the findings of the AmericasNLP 2025 Shared Tasks: (1) machine translation for truly low-resource languages, (2) morphological adaptation for generating educational examples, and (3) developing metrics for machine translation in Indigenous languages. The shared tasks cover 14 diverse Indigenous languages of the Americas. A total of 11 teams participated, submitting 26 systems across all tasks, languages, and models. We describe the shared tasks, introduce the datasets and evaluation metrics used, summarize the baselines and submitted systems, and report our findings.

2024

pdf bib
Awajun-OP: Multi-domain dataset for Spanish–Awajun Machine Translation
Oscar Moreno | Yanua Atamain | Arturo Oncevay
Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)

We introduce a Spanish-Awajun parallel dataset of 22k high-quality sentence pairs with the help of the journalistic organization Company C. This dataset consists of parallel data obtained from various web sources such as poems, stories, laws, protocols, guidelines, handbooks, the Bible, and news published by Company C. The study also includes an analysis of the dataset’s performance for Spanish-Awajun translation using a Transformer architecture with transfer learning from a parent model, utilizing Spanish-English and Spanish-Finnish as high-resource language-pairs. As far as we know, this is the first Spanish-Awajun machine translation study, and we hope that this work will serve as a starting point for future research on this neglected Peruvian language.

2021

pdf bib
The REPU CSSpanish–Quechua Submission to the AmericasNLP 2021 Shared Task on Open Machine Translation
Oscar Moreno
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

We present the submission of REPUcs to the AmericasNLP machine translation shared task for the low resource language pair Spanish–Quechua. Our neural machine translation system ranked first in Track two (development set not used for training) and third in Track one (training includes development data). Our contribution is focused on: (i) the collection of new parallel data from different web sources (poems, lyrics, lexicons, handbooks), and (ii) using large Spanish–English data for pre-training and then fine-tuning the Spanish–Quechua system. This paper describes the new parallel corpora and our approach in detail.