π-YALLI : un nouveau corpus pour des modèles de langue nahuatl / Yankuik nawatlahtolkorpus pampa tlahtolmachiotl
Juan-José Guzmán-Landa, Juan-Manuel Torres-Moreno, Martha Lorena Avendaño Garrido, Miguel Figueroa-Saavedra, Ligia Quintana-Torres, Graham Ranger, Carlos-Emiliano González-Gallardo, Elvys Linhares-Pontes, Patricia Velázquez-Morales, Luis-Gil Moreno-Jiménez
Abstract
π-YALLI : a new corpus for Nahuatl Language Models The Nahuatl is a language with few computational resources, despite the fact that it is a living language spoken by around two million people. We built π-YALLI, a corpus that enables research and development of dynamic and static Language Models (LM). We measured the perplexity of π-YALLI, evaluating state-of-the-art LM performance on a manually annotated semantic similarity corpus relative to annotator agreement. The results show the difficulty of working with this π-language, but at the same time open up interesting perspectives for the study of other NLP tasks on Nahuatl.- Anthology ID:
- 2025.jeptalnrecital-taln.49
- Volume:
- Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux
- Month:
- 6
- Year:
- 2025
- Address:
- Marseille, France
- Editors:
- Frédéric Bechet, Adrian-Gabriel Chifu, Karen Pinel-sauvagnat, Benoit Favre, Eliot Maes, Diana Nurbakova
- Venue:
- JEP/TALN/RECITAL
- SIG:
- Publisher:
- ATALA \\& ARIA
- Note:
- Pages:
- 802–816
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-10/2025.jeptalnrecital-taln.49/
- DOI:
- Cite (ACL):
- Juan-José Guzmán-Landa, Juan-Manuel Torres-Moreno, Martha Lorena Avendaño Garrido, Miguel Figueroa-Saavedra, Ligia Quintana-Torres, Graham Ranger, Carlos-Emiliano González-Gallardo, Elvys Linhares-Pontes, Patricia Velázquez-Morales, and Luis-Gil Moreno-Jiménez. 2025. π-YALLI : un nouveau corpus pour des modèles de langue nahuatl / Yankuik nawatlahtolkorpus pampa tlahtolmachiotl. In Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux, pages 802–816, Marseille, France. ATALA \\& ARIA.
- Cite (Informal):
- π-YALLI : un nouveau corpus pour des modèles de langue nahuatl / Yankuik nawatlahtolkorpus pampa tlahtolmachiotl (Guzmán-Landa et al., JEP/TALN/RECITAL 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-10/2025.jeptalnrecital-taln.49.pdf