Visibility as Survival: Generalizing NLP for Native Alaskan Language Identification

Ivory Yang, Chunhui Zhang, Yuxin Wang, Zhongyu Ouyang, Soroush Vosoughi


Abstract
Indigenous languages remain largely invisible in commercial language identification (LID) systems, a stark reality exemplified by Google Translate’s LangID tool, which supports over 100 languages but excludes all 150 Indigenous languages of North America. This technological marginalization is particularly acute for Alaska’s 20 Native languages, all of which face endangerment despite their rich linguistic heritage. We present GenAlaskan, a framework demonstrating how both large language models and specialized classifiers can effectively identify these languages with minimal data. Working closely with Native Alaskan community members, we create Akutaq-2k, a carefully curated dataset of 2000 sentences spanning all 20 languages, named after the traditional Yup’ik dessert, symbolizing the blending of diverse elements. We design few-shot prompting on proprietary and open-source LLMs, achieving nearly perfect accuracy with just 40 examples per language. While initial zero-shot attempts show limited success, our systematic attention head pruning revealed critical architectural components for accurate language differentiation, providing insights into model decision-making for low-resource languages. Our results challenge the notion that effective Indigenous language identification requires massive resources or corporate infrastructure, demonstrating that targeted technological interventions can drive meaningful progress in preserving endangered languages in the digital age.
Anthology ID:
2025.findings-acl.364
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6965–6979
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.364/
DOI:
Bibkey:
Cite (ACL):
Ivory Yang, Chunhui Zhang, Yuxin Wang, Zhongyu Ouyang, and Soroush Vosoughi. 2025. Visibility as Survival: Generalizing NLP for Native Alaskan Language Identification. In Findings of the Association for Computational Linguistics: ACL 2025, pages 6965–6979, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Visibility as Survival: Generalizing NLP for Native Alaskan Language Identification (Yang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.364.pdf