Harnessing NLP for Indigenous Language Education: Fine-Tuning Large Language Models for Sentence Transformation

Mahshar Yahan, Dr. Mohammad Islam


Abstract
Indigenous languages face significant challenges due to their endangered status and limited resources which makes their integration into NLP systems difficult. This study investigates the use of Large Language Models (LLMs) for sentence transformation tasks in Indigenous languages, focusing on Bribri, Guarani, and Maya. Here, the dataset from the AmericasNLP 2025 Shared Task 2 is used to explore sentence transformations in Indigenous languages. The goal is to create educational tools by modifying sentences based on linguistic instructions, such as changes in tense, aspect, voice, person, and other grammatical features. The methodology involves preprocessing data, simplifying transformation tags, and designing zero-shot and few-shot prompts to guide LLMs in sentence rewriting. Fine-tuning techniques like LoRA and Bits-and-Bytes quantization were employed to optimize model performance while reducing computational costs. Among the tested models, Llama 3.2(3B-Instruct) demonstrated superior performance across all languages with high BLEU and ChrF++ scores, particularly excelling in few-shot settings. The Llama 3.2 model achieved BLEU scores of 19.51 for Bribri, 13.67 for Guarani, and 55.86 for Maya in test settings. Additionally, ChrF++ scores reached 50.29 for Bribri, 58.55 for Guarani, and 80.12 for Maya, showcasing its effectiveness in handling sentence transformation. These results highlight the potential of LLMs that can improve NLP tools for indigenous languages and help preserve linguistic diversity.
Anthology ID:
2025.americasnlp-1.14
Volume:
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Manuel Mager, Abteen Ebrahimi, Robert Pugh, Shruti Rijhwani, Katharina Von Der Wense, Luis Chiruzzo, Rolando Coto-Solano, Arturo Oncevay
Venues:
AmericasNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–125
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.americasnlp-1.14/
DOI:
Bibkey:
Cite (ACL):
Mahshar Yahan and Dr. Mohammad Islam. 2025. Harnessing NLP for Indigenous Language Education: Fine-Tuning Large Language Models for Sentence Transformation. In Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP), pages 119–125, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Harnessing NLP for Indigenous Language Education: Fine-Tuning Large Language Models for Sentence Transformation (Yahan & Islam, AmericasNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.americasnlp-1.14.pdf