Evaluation of Morphological Segmentation Methods for Hupa

Nathaniel Parkes, Zoey Liu


Abstract
Building downstream NLP applications with tokenization systems built on morphological segmentation has been shown to be fruitful for certain morphologically-rich languages. Yet, indigenous and endangered languages, which tend to be highly polysynthetic, thereby a po- tential beneficiary of this approach, pose ad- ditional difficulties in their limited access to annotated data for morphological segmenta- tion tasks. In this study, we develop mor- phological segmentation models for Hupa, a Dene/Athabaskan language critically endan- gered to North America. With a total of 595 word types, we seek to identify an optimal mor- phological segmentation model and illustrate how those tested perform under different levels of training data limitation. We propose a simple method that casts morphological segmentation as a sequence binary classification task. While this approach does not outperform the estab- lished practice of multi-class classification, it outperforms neural alternatives. This work is conducted under the intention to act as a start- ing point for future technological developments with Hupa looking to leverage its morpholog- ical qualities, which we hope can serve as a reflection for work with other indigenous lan- guages being studied under similar constraints.
Anthology ID:
2025.computel-main.22
Volume:
Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:
March
Year:
2025
Address:
Honolulu, Hawaii, USA
Editors:
Jordan Lachler, Godfred Agyapong, Antti Arppe, Sarah Moeller, Aditi Chaudhary, Shruti Rijhwani, Daisy Rosenblum
Venues:
ComputEL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
188–193
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.computel-main.22/
DOI:
Bibkey:
Cite (ACL):
Nathaniel Parkes and Zoey Liu. 2025. Evaluation of Morphological Segmentation Methods for Hupa. In Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 188–193, Honolulu, Hawaii, USA. Association for Computational Linguistics.
Cite (Informal):
Evaluation of Morphological Segmentation Methods for Hupa (Parkes & Liu, ComputEL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.computel-main.22.pdf