Julia Mainzinger
2026
IndigiEval: Evaluating LLMs in North American Indigenous Languages
Julia Mainzinger | Jacqueline Brixey
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Julia Mainzinger | Jacqueline Brixey
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
This paper presents IndigiEval, a framework for evaluating the language and cultural proficiency of several commercially available large language models (LLMs) across five North American Indigenous languages (Mvskoke, Choctaw, Cherokee, Cheyenne, and Hawaiian). This framework is a qualitative evaluation method intended for communities with small speaker populations to be able to critically evaluate LLM performance with minimal data and human effort. IndigiEval includes tasks such as answering cultural questions, translation, text generation, and speech recognition. The results of our experiments indicate that no currently available LLM performs well across all evaluation categories, and that LLMs frequently hallucinate orthographies, grammatical structures, cultural knowledge, and vocabulary for all languages and cultures considered. Our proposed evaluation framework is not intended as a comprehensive score, but rather a qualitative and flexible framework to inform language communities about a given LLM’s potential as a resource, since each language has unique environments, strengths, and availability of resources.
2024
Technology and Language Revitalization: A Roadmap for the Mvskoke Language
Julia Mainzinger
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages
Julia Mainzinger
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages
This paper is a discussion of how NLP can come alongside community efforts to aid in revitalizing the Mvskoke language. Mvskoke is a language indigenous to the southeastern United States that has seen an increase in language revitalization efforts in the last few years. This paper presents an overview of available resources in Mvskoke, an exploration of relevant NLP tasks and related work in endangered language contexts, and applications to language revitalization.
Fine-Tuning ASR models for Very Low-Resource Languages: A Study on Mvskoke
Julia Mainzinger | Gina-Anne Levow
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Julia Mainzinger | Gina-Anne Levow
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Recent advancements in multilingual models for automatic speech recognition (ASR) have been able to achieve a high accuracy for languages with extremely limited resources. This study examines ASR modeling for the Mvskoke language, an indigenous language of America. The parameter efficiency of adapter training is contrasted with training entire models, and it is demonstrated how performance varies with different amounts of data. Additionally, the models are evaluated with trigram language model decoding, and the outputs are compared across different types of speech recordings. Results show that training an adapter is both parameter efficient and gives higher accuracy for a relatively small amount of data.