Fitting a Square Peg into a Round Hole: Creating a UniMorph dataset of Kanien’kéha Verbs

Anna Kazantseva, Akwiratékha Martin, Karin Michelson, Jean-Pierre Koenig


Abstract
This paper describes efforts to annotate a dataset of verbs in the Iroquoian language Kanien’kéha (a.k.a. Mohawk) using the UniMorph schema (Batsuren et al. 2022a). It is based on the output of a symbolic model - a hand-built verb conjugator. Morphological constituents of each verb are automatically annotated with UniMorph tags. Overall the process was smooth but some central features of the language did not fall neatly into the schema which resulted in a large number of custom tags and a somewhat ad hoc mapping process. We think the same difficulties are likely to arise for other Iroquoian languages and perhaps other North American language families. This paper describes our decision making process with respect to Kanien’kéha and reports preliminary results of morphological induction experiments using the dataset.
Anthology ID:
2024.computel-1.7
Volume:
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Sarah Moeller, Godfred Agyapong, Antti Arppe, Aditi Chaudhary, Shruti Rijhwani, Christopher Cox, Ryan Henke, Alexis Palmer, Daisy Rosenblum, Lane Schwartz
Venues:
ComputEL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39–51
Language:
URL:
https://aclanthology.org/2024.computel-1.7
DOI:
Bibkey:
Cite (ACL):
Anna Kazantseva, Akwiratékha Martin, Karin Michelson, and Jean-Pierre Koenig. 2024. Fitting a Square Peg into a Round Hole: Creating a UniMorph dataset of Kanien’kéha Verbs. In Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 39–51, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Fitting a Square Peg into a Round Hole: Creating a UniMorph dataset of Kanien’kéha Verbs (Kazantseva et al., ComputEL-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/improve-issue-templates/2024.computel-1.7.pdf