Modelling the Reduplicating Lushootseed Morphology with an FST and LSTM

Jack Rueter, Mika Hämäläinen, Khalid Alnajjar


Abstract
In this paper, we present an FST based approach for conducting morphological analysis, lemmatization and generation of Lushootseed words. Furthermore, we use the FST to generate training data for an LSTM based neural model and train this model to do morphological analysis. The neural model reaches a 71.9% accuracy on the test data. Furthermore, we discuss reduplication types in the Lushootseed language forms. The approach involves the use of both attested instances of reduplication and bare stems for applying a variety of reduplications to, as it is unclear just how much variation can be attributed to the individual speakers and authors of the source materials. That is, there may be areal factors that can be aligned with certain types of reduplication and their frequencies.
Anthology ID:
2023.americasnlp-1.6
Volume:
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:
July
Year:
2023
Address:
Toronto, Canada
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–46
Language:
URL:
https://aclanthology.org/2023.americasnlp-1.6
DOI:
10.18653/v1/2023.americasnlp-1.6
Bibkey:
Cite (ACL):
Jack Rueter, Mika Hämäläinen, and Khalid Alnajjar. 2023. Modelling the Reduplicating Lushootseed Morphology with an FST and LSTM. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 40–46, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Modelling the Reduplicating Lushootseed Morphology with an FST and LSTM (Rueter et al., AmericasNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2023.americasnlp-1.6.pdf