Arturo Martínez Peguero


2025

pdf bib
Leveraging Dictionaries and Grammar Rules for the Creation of Educational Materials for Indigenous Languages
Justin Vasselli | Haruki Sakajo | Arturo Martínez Peguero | Frederikus Hudi | Taro Watanabe
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

This paper describes the NAIST submission to the AmericasNLP 2025 shared task on the creation of educational materials for Indigenous languages. We implement three systems to tackle the unique challenges of each language. The first system, used for Maya and Guarani, employs a straightforward GPT-4o few-shot prompting technique, enhanced by synthetically generated examples to ensure coverage of all grammatical variations encountered. The second system, used for Bribri, integrates dictionary-based alignment and linguistic rules to systematically manage linguisticand lexical transformations. Finally, we developed a specialized rule-based system for Nahuatl that systematically reduces sentences to their base form, simplifying the generation of correct morphology variants.

2024

pdf bib
Applying Linguistic Expertise to LLMs for Educational Material Development in Indigenous Languages
Justin Vasselli | Arturo Martínez Peguero | Junehwan Sung | Taro Watanabe
Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)

This paper presents our approach to the AmericasNLP 2024 Shared Task 2 as the JAJ (/dʒæz/) team. The task aimed at creating educational materials for indigenous languages, and we focused on Maya and Bribri. Given the unique linguistic features and challenges of these languages, and the limited size of the training datasets, we developed a hybrid methodology combining rule-based NLP methods with prompt-based techniques. This approach leverages the meta-linguistic capabilities of large language models, enabling us to blend broad, language-agnostic processing with customized solutions. Our approach lays a foundational framework that can be expanded to other indigenous languages languages in future work.