Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus

Kalvin Hartwig; Evan Lucas; Timothy Havens

doi:10.18653/v1/2023.americasnlp-1.8

Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus

Kalvin Hartwig, Evan Lucas, Timothy Havens

Abstract

The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90% across a five-fold cross validation and 72% when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly on Github at [link to be inserted for final version, working demonstration notebook uploaded with paper].

Anthology ID:: 2023.americasnlp-1.8
Volume:: Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:: AmericasNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 58–66
Language:
URL:: https://aclanthology.org/2023.americasnlp-1.8
DOI:: 10.18653/v1/2023.americasnlp-1.8
Bibkey:
Cite (ACL):: Kalvin Hartwig, Evan Lucas, and Timothy Havens. 2023. Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 58–66, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus (Hartwig et al., AmericasNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jeptaln-2024-ingestion/2023.americasnlp-1.8.pdf

PDF Search