Experiments in Multi-Variant Natural Language Processing for Nahuatl

Robert Pugh, Francis Tyers


Abstract
Linguistic variation is a complicating factor for digital language technologies. This is particularly true for languages that lack an official “standard” variety, including many regional and minoritized languages. In this paper, we describe a set of experiments focused on multivariant natural language processing for the Nahuatl, an indigenous Mexican language with a high level of linguistic variation and no single recognized standard variant. Using small (10k tokens), recently-published annotated datasets for two Nahuatl variants, we compare the performance of single-variant, cross-variant, and joint training, and explore how different models perform on a third Nahuatl variant, unseen in training. These results and the subsequent discussion contribute to efforts of developing low-resource NLP that is robust to diatopic variation. We share all code used to process the data and run the experiments.
Anthology ID:
2024.vardial-1.12
Volume:
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Marcos Zampieri, Preslav Nakov, Jörg Tiedemann
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
140–151
Language:
URL:
https://aclanthology.org/2024.vardial-1.12
DOI:
Bibkey:
Cite (ACL):
Robert Pugh and Francis Tyers. 2024. Experiments in Multi-Variant Natural Language Processing for Nahuatl. In Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), pages 140–151, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Experiments in Multi-Variant Natural Language Processing for Nahuatl (Pugh & Tyers, VarDial-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.vardial-1.12.pdf
Supplementary material:
 2024.vardial-1.12.SupplementaryMaterial.txt