Statistical and Neural Methods for Hawaiian Orthography Modernization

Jaden Kapali, Keaton Williamson, Winston Wu


Abstract
Hawaiian orthography employs two distinct spelling systems, both of which are used by communities of speakers today. These two spelling systems are distinguished by the presence of the ‘okina letter and kahakō diacritic, which represent glottal stops and long vowels, respectively. We develop several models ranging in complexity to convert between these two orthographies. Our results demonstrate that simple statistical n-gram models surprisingly outperform neural seq2seq models and LLMs, highlighting the potential for traditional machine learning approaches in a low-resource setting.
Anthology ID:
2025.emnlp-main.1782
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35137–35143
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1782/
DOI:
Bibkey:
Cite (ACL):
Jaden Kapali, Keaton Williamson, and Winston Wu. 2025. Statistical and Neural Methods for Hawaiian Orthography Modernization. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35137–35143, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Statistical and Neural Methods for Hawaiian Orthography Modernization (Kapali et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1782.pdf
Checklist:
 2025.emnlp-main.1782.checklist.pdf