Graham Ranger
2026
Nawatl Context-Free Grammars for Natural Language Processing
Juan Jose Guzman Landa | Juan-Manuel Torres-Moreno | Graham Ranger | Miguel Figueroa-Saavedra | Ligia Quintana Torres | Carlos-Emiliano Gonzalez-Gallardo | Luis Gil Moreno Jimenez | Martha Lorena Avendaño Garrido
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Juan Jose Guzman Landa | Juan-Manuel Torres-Moreno | Graham Ranger | Miguel Figueroa-Saavedra | Ligia Quintana Torres | Carlos-Emiliano Gonzalez-Gallardo | Luis Gil Moreno Jimenez | Martha Lorena Avendaño Garrido
Proceedings of the Fifteenth Language Resources and Evaluation Conference
The aim of this article is to introduce Context-Free Grammars (CFG) for the Nawatl language. Nawatl is an Amerindian language of the đťś‹-language type, i.e. a language with few digital resources. For this reason the corpora available for the learning of Large Language Models (LLMs) are virtually non-existent, posing a significant challenge. The goal is to produce a substantial number of syntactically valid artificial Nawatl sentences and thereby to expand the corpora for the purpose of learning embeddings (static models or probably LLMs). For this objective, we introduce two new Nawatl CFGs and use them in generative mode. Thanks to these grammars, it is possible to expand Nawatl corpus significantly and subsequently to use it to learn embeddings (such as FastText) and to evaluate their relevance in semantic similarity tasks. The results show an improvement compared to the results obtained using only the original corpus without artificial expansion, and also demonstrate that economic embeddings often perform better than some LLMs.
2025
π-YALLI : un nouveau corpus pour des modèles de langue nahuatl / Yankuik nawatlahtolkorpus pampa tlahtolmachiotl
Juan-José Guzmán-Landa | Juan-Manuel Torres-Moreno | Martha Lorena Avendaño Garrido | Miguel Figueroa-Saavedra | Ligia Quintana-Torres | Graham Ranger | Carlos-Emiliano González-Gallardo | Elvys Linhares-Pontes | Patricia Velázquez-Morales | Luis-Gil Moreno-Jiménez
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux
Juan-José Guzmán-Landa | Juan-Manuel Torres-Moreno | Martha Lorena Avendaño Garrido | Miguel Figueroa-Saavedra | Ligia Quintana-Torres | Graham Ranger | Carlos-Emiliano González-Gallardo | Elvys Linhares-Pontes | Patricia Velázquez-Morales | Luis-Gil Moreno-Jiménez
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux
π-YALLI : a new corpus for Nahuatl Language Models The Nahuatl is a language with few computational resources, despite the fact that it is a living language spoken by around two million people. We built π-YALLI, a corpus that enables research and development of dynamic and static Language Models (LM). We measured the perplexity of π-YALLI, evaluating state-of-the-art LM performance on a manually annotated semantic similarity corpus relative to annotator agreement. The results show the difficulty of working with this π-language, but at the same time open up interesting perspectives for the study of other NLP tasks on Nahuatl.