Evaluating Unsupervised Approaches to Morphological Segmentation for Wolastoqey

Diego Bear, Paul Cook


Abstract
Finite-state approaches to morphological analysis have been shown to improve the performance of natural language processing systems for polysynthetic languages, in-which words are generally composed of many morphemes, for tasks such as language modelling (Schwartz et al., 2020). However, finite-state morphological analyzers are expensive to construct and require expert knowledge of a language’s structure. Currently, there is no broad-coverage finite-state model of morphology for Wolastoqey, also known as Passamaquoddy-Maliseet, an endangered low-resource Algonquian language. As this is the case, in this paper, we investigate using two unsupervised models, MorphAGram and Morfessor, to obtain morphological segmentations for Wolastoqey. We train MorphAGram and Morfessor models on a small corpus of Wolastoqey words and evaluate using two an notated datasets. Our results indicate that MorphAGram outperforms Morfessor for morphological segmentation of Wolastoqey.
Anthology ID:
2022.sigul-1.20
Volume:
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venue:
SIGUL
SIG:
SIGUL
Publisher:
European Language Resources Association
Note:
Pages:
155–160
Language:
URL:
https://aclanthology.org/2022.sigul-1.20
DOI:
Bibkey:
Cite (ACL):
Diego Bear and Paul Cook. 2022. Evaluating Unsupervised Approaches to Morphological Segmentation for Wolastoqey. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, pages 155–160, Marseille, France. European Language Resources Association.
Cite (Informal):
Evaluating Unsupervised Approaches to Morphological Segmentation for Wolastoqey (Bear & Cook, SIGUL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/2022.sigul-1.20.pdf