Nawatl Context-Free Grammars for Natural Language Processing

Juan Jose Guzman Landa, Juan-Manuel Torres-Moreno, Graham Ranger, Miguel Figueroa-Saavedra, Ligia Quintana Torres, Carlos-Emiliano Gonzalez-Gallardo, Luis Gil Moreno Jimenez, Martha Lorena Avendaño Garrido


Abstract
The aim of this article is to introduce Context-Free Grammars (CFG) for the Nawatl language. Nawatl is an Amerindian language of the 𝜋-language type, i.e. a language with few digital resources. For this reason the corpora available for the learning of Large Language Models (LLMs) are virtually non-existent, posing a significant challenge. The goal is to produce a substantial number of syntactically valid artificial Nawatl sentences and thereby to expand the corpora for the purpose of learning embeddings (static models or probably LLMs). For this objective, we introduce two new Nawatl CFGs and use them in generative mode. Thanks to these grammars, it is possible to expand Nawatl corpus significantly and subsequently to use it to learn embeddings (such as FastText) and to evaluate their relevance in semantic similarity tasks. The results show an improvement compared to the results obtained using only the original corpus without artificial expansion, and also demonstrate that economic embeddings often perform better than some LLMs.
Anthology ID:
2026.lrec-main.263
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
3333–3342
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.263/
DOI:
Bibkey:
Cite (ACL):
Juan Jose Guzman Landa, Juan-Manuel Torres-Moreno, Graham Ranger, Miguel Figueroa-Saavedra, Ligia Quintana Torres, Carlos-Emiliano Gonzalez-Gallardo, Luis Gil Moreno Jimenez, and Martha Lorena Avendaño Garrido. 2026. Nawatl Context-Free Grammars for Natural Language Processing. International Conference on Language Resources and Evaluation, main:3333–3342.
Cite (Informal):
Nawatl Context-Free Grammars for Natural Language Processing (Guzman Landa et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.263.pdf