Arturo Mp
2026
Measuring Linguistic Competence of LLMs on Indigenous Languages of the Americas
Justin Vasselli | Arturo Mp | Frederikus Hudi | Haruki Sakajo | Taro Watanabe
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Justin Vasselli | Arturo Mp | Frederikus Hudi | Haruki Sakajo | Taro Watanabe
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
This paper presents an evaluation framework for probing large language models’ linguistic knowledge of Indigenous languages of the Americas using zero- and few-shot prompting. The framework consists of three tasks: (1) language identification, (2) cloze completion of Spanish sentences supported by Indigenous-language translations, and (3) grammatical feature classification. We evaluate models from five major families (GPT, Gemini, DeepSeek, Qwen, and LLaMA) on 13 Indigenous languages, including Bribri, Guarani, and Nahuatl. The results show substantial variation across both languages and model families. While a small number of model-language combinations demonstrate consistently stronger performance across tasks, many others perform near chance, highlighting persistent gaps in current models’ abilities on Indigenous languages.