Olivia Sammons


2024

pdf
Creating Digital Learning and Reference Resources for Southern Michif
Heather Souter | Olivia Sammons | David Huggins Daines
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages

Minority and Indigenous languages are often under-documented and under-resourced. Where such resources do exist, particularly in the form of legacy materials, they are often inaccessible to learners and educators involved in revitalization efforts, whether due to the limitations of their original formats or the structure of their contents. Digitizing such resources and making them available on a variety of platforms is one step in overcoming these barriers. This is a major undertaking which requires significant expertise at the intersection of documentary linguistics, computational linguistics, and software development, and must be done while walking alongside speakers and language specialists in the community. We discuss the particular strategies and challenges involved in the development of one such resource, and make recommendations for future projects with a similar goal of mobilizing legacy language resources.

2020

pdf
The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software
Roland Kuhn | Fineen Davis | Alain Désilets | Eric Joanis | Anna Kazantseva | Rebecca Knowles | Patrick Littell | Delaney Lothian | Aidan Pine | Caroline Running Wolf | Eddie Santos | Darlene Stewart | Gilles Boulianne | Vishwa Gupta | Brian Maracle Owennatékha | Akwiratékha’ Martin | Christopher Cox | Marie-Odile Junker | Olivia Sammons | Delasie Torkornoo | Nathan Thanyehténhas Brinklow | Sara Child | Benoît Farley | David Huggins-Daines | Daisy Rosenblum | Heather Souter
Proceedings of the 28th International Conference on Computational Linguistics

This paper surveys the first, three-year phase of a project at the National Research Council of Canada that is developing software to assist Indigenous communities in Canada in preserving their languages and extending their use. The project aimed to work within the empowerment paradigm, where collaboration with communities and fulfillment of their goals is central. Since many of the technologies we developed were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (such as Kanyen’kéha, in the Iroquoian language family), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for recordings of speech in Indigenous languages (and other languages), software for implementing text prediction and read-along audiobooks for Indigenous languages, and several other subprojects.