Victor Mijangos


2025

pdf bib
Py-Elotl: A Python NLP package for the languages of Mexico
Ximena Gutierrez-Vasques | Robert Pugh | Victor Mijangos | Diego Barriga Martínez | Paul Aguilar | Mikel Segura | Paola Innes | Javier Santillan | Cynthia Montaño | Francis Tyers
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

This work presents Py-elotl, a suite of tools and resources in Python for processing text in several indigenous languages spoken in Mexico. These resources include parallel corpora, linguistic taggers/analyzers, and orthographic normalization tools. This work aims to develop essential resources to support language pre-processing and linguistic research, and the future creation of more complete downstream applications that could be useful for the speakers and enhance the visibility of these languages. The current version supports language groups such as Nahuatl, Otomi, Mixtec, and Huave. This project is open-source and freely available for use and collaboration

2021

pdf bib
Automatic Interlinear Glossing for Otomi language
Diego Barriga Martínez | Victor Mijangos | Ximena Gutierrez-Vasques
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

In linguistics, interlinear glossing is an essential procedure for analyzing the morphology of languages. This type of annotation is useful for language documentation, and it can also provide valuable data for NLP applications. We perform automatic glossing for Otomi, an under-resourced language. Our work also comprises the pre-processing and annotation of the corpus. We implement different sequential labelers. CRF models represented an efficient and good solution for our task. Two main observations emerged from our work: 1) models with a higher number of parameters (RNNs) performed worse in our low-resource scenario; and 2) the information encoded in the CRF feature function plays an important role in the prediction of labels; however, even in cases where POS tags are not available it is still possible to achieve competitive results.

2018

pdf bib
Comparing morphological complexity of Spanish, Otomi and Nahuatl
Ximena Gutierrez-Vasques | Victor Mijangos
Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing

We use two small parallel corpora for comparing the morphological complexity of Spanish, Otomi and Nahuatl. These are languages that belong to different linguistic families, the latter are low-resourced. We take into account two quantitative criteria, on one hand the distribution of types over tokens in a corpus, on the other, perplexity and entropy as indicators of word structure predictability. We show that a language can be complex in terms of how many different morphological word forms can produce, however, it may be less complex in terms of predictability of its internal structure of words.