Abstract
The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90% across a five-fold cross validation and 72% when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly on Github at [link to be inserted for final version, working demonstration notebook uploaded with paper].- Anthology ID:
- 2023.americasnlp-1.8
- Volume:
- Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
- Venue:
- AmericasNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 58–66
- Language:
- URL:
- https://aclanthology.org/2023.americasnlp-1.8
- DOI:
- 10.18653/v1/2023.americasnlp-1.8
- Cite (ACL):
- Kalvin Hartwig, Evan Lucas, and Timothy Havens. 2023. Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 58–66, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus (Hartwig et al., AmericasNLP 2023)
- PDF:
- https://preview.aclanthology.org/jeptaln-2024-ingestion/2023.americasnlp-1.8.pdf