Albert Ventayol-Boada

Also published as: Albert Ventayol-boada


2023

pdf bib
Applications of classification trees for endangered language description: Finite verb morphology in Kolyma Yukaghir
Albert Ventayol-Boada
Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf
Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir
Albert Ventayol-boada | Nathan Roll | Simon Todd
Proceedings of the Second Workshop on NLP Applications to Field Linguistics

This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study, we turn to POS induction to evaluate if classes based on quantification of the distributions in which roots and affixes are used can be useful for language description purposes, and, if so, what those classes might be. We qualitatively compare clusters of roots and affixes based on four different definitions of their distributions. The results show that clustering is more reliable for words that typically bear more morphology. Additionally, the results suggest that the number of POS classes in Kolyma Yukaghir might be smaller than stated in current descriptions. This study thus demonstrates how unsupervised learning methods can provide insights for language description, particularly for highly inflectional languages.