Delfino Zacarías Márquez
Also published as: Delfino Zacarias Marquez
2026
The Construction of a Mixe Variant Parallel Corpus
Ivan Vladimir Meza Ruiz | Delfino Zacarias Marquez | Martha Elba Ramírez Andrés | Victoriano Santiago Cayetano | Jonathan Santiago Antonio | Carlos Daniel Hernández Mena
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Ivan Vladimir Meza Ruiz | Delfino Zacarias Marquez | Martha Elba Ramírez Andrés | Victoriano Santiago Cayetano | Jonathan Santiago Antonio | Carlos Daniel Hernández Mena
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present the progress and challenges of constructing a Mixe-Spanish parallel corpus for Machine Translation. Mixe is a Mexican Indigenous Language that is spoken by more than 100, 000 speakers. In particular, we focus on the San Juan Guivicovic Mixe variant (mir). The resulting resource is available under an open research license (CC BY-NC-SA). It was created following a previous state-of-the-art methodology for Mexican indigenous languages. In this case, we used paid translators from the variant region. We present a baseline system.
2021
Ayuuk-Spanish Neural Machine Translator
Delfino Zacarías Márquez | Ivan Vladimir Meza Ruiz
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Delfino Zacarías Márquez | Ivan Vladimir Meza Ruiz
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
This paper presents the first neural machine translator system for the Ayuuk language. In our experiments we translate from Ayuuk to Spanish, and fromSpanish to Ayuuk. Ayuuk is a language spoken in the Oaxaca state of Mexico by the Ayuukjä’äy people (in Spanish commonly known as Mixes. We use different sources to create a low-resource parallel corpus, more than 6,000 phrases. For some of these resources we rely on automatic alignment. The proposed system is based on the Transformer neural architecture and it uses sub-word level tokenization as the input. We show the current performance given the resources we have collected for the San Juan Güichicovi variant, they are promising, up to 5 BLEU. We based our development on the Masakhane project for African languages.