Indigenous Writing Systems Matter: Rethinking NLP beyond Alphabetic Bias through Script-Aware Modeling

Ngoc Tan Le, Mamady Traore, Cristian Ahumada Oliva, Fatiha Sadat


Abstract
Natural Language Processing (NLP) has made significant progress in recent years, largely driven by large-scale pretrained models and vast textual and multimodal corpora. However, these advances remain unevenly distributed, disproportionately benefiting high-resource languages while Indigenous and endangered languages—especially those employing diverse and less widely supported writing systems—remain underrepresented. This paper examines the role of writing system diversity in NLP, with a focus on Indigenous and endangered languages. We propose a theoretical framework that accounts for variation across writing systems and its implications for computational modeling. Specifically, we (i) provide an overview of writing system diversity, (ii) synthesize available computational resources, and (iii) present a structured analysis of challenges in modeling, tokenization, and evaluation.Our analysis shows that writing system diversity reveals structural biases embedded in current NLP pipelines. We conclude by identifying key open challenges and outlining directions for future research toward more inclusive, script-aware NLP approaches that better account for writing system variation.
Anthology ID:
2026.computel-1.13
Volume:
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Godfred Agyapong, Sarah Moeller, Antti Arppe, Ali Marashian, Daisy Rosenblum
Venues:
ComputEL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
118–124
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.computel-1.13/
DOI:
Bibkey:
Cite (ACL):
Ngoc Tan Le, Mamady Traore, Cristian Ahumada Oliva, and Fatiha Sadat. 2026. Indigenous Writing Systems Matter: Rethinking NLP beyond Alphabetic Bias through Script-Aware Modeling. In Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9), pages 118–124, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Indigenous Writing Systems Matter: Rethinking NLP beyond Alphabetic Bias through Script-Aware Modeling (Le et al., ComputEL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.computel-1.13.pdf
Supplementarymaterial:
 2026.computel-1.13.SupplementaryMaterial.txt