Christopher Cox


2024

pdf bib
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages
Sarah Moeller | Godfred Agyapong | Antti Arppe | Aditi Chaudhary | Shruti Rijhwani | Christopher Cox | Ryan Henke | Alexis Palmer | Daisy Rosenblum | Lane Schwartz
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages

2023

pdf
Speech-to-text recognition for multilingual spoken data in language documentation
Lorena Martín Rodríguez | Christopher Cox
Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages

2022

pdf
An Expanded Finite-State Transducer for Tsuut’ina Verbs
Joshua Holden | Christopher Cox | Antti Arppe
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes the expansion of a finite state transducer (FST) for the transitive verb system of Tsuut’ina (ISO 639-3: srs), a Dene (Athabaskan) language spoken in Alberta, Canada. Dene languages have unique templatic morphology, in which lexical, inflectional and derivational tiers are interlaced. Drawing on data from close to 9,000 verbal forms, the expanded model can handle a great range of common and rare argument structure types, including ditransitive and uniquely Dene object experiencer verbs. While challenges of speed remain, this expansion shows the ability of FST modelling to handle morphology of this type, and the expnded FST shows great promise for community language applications such as a morphologically informed online dictionary and word predictor, and for further FST development. This paper describes the expansion of a finite state transducer (FST) for the transitive verb system of Tsuut’ina (ISO 639-3: srs), a Dene (Athabaskan) language spoken in Alberta, Canada. Dene languages have unique templatic morphology, in which lexical, inflectional and derivational tiers are interlaced. Drawing on data from over 12,000 verbs forms, the expanded model can handle a great range of common and rare argument structure types, including ditransitive and uniquely Dene object experiencer verbs. While challenges of speed remain, this expansion shows the ability of FST modelling to handle morphology of this type, and the expnded FST shows great promise for community language applications such as a morphologically informed online dictionary and word predictor, and for further FST development.

pdf
Gi2Pi Rule-based, index-preserving grapheme-to-phoneme transformations
Aidan Pine | Patrick William Littell | Eric Joanis | David Huggins-Daines | Christopher Cox | Fineen Davis | Eddie Antonio Santos | Shankhalika Srikanth | Delasie Torkornoo | Sabrina Yu
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages

This paper describes the motivation and implementation details for a rule-based, index-preserving grapheme-to-phoneme engine ‘Gi2Pi' implemented in pure Python and released under the open source MIT license. The engine and interface have been designed to prioritize the developer experience of potential contributors without requiring a high level of programming knowledge. ‘Gi2Pi' already provides mappings for 30 (mostly Indigenous) languages, and the package is accompanied by a web-based interactive development environment, a RESTful API, and extensive documentation to encourage the addition of more mappings in the future. We also present three downstream applications of ‘Gi2Pi' and show results of a preliminary evaluation.

2021

pdf
User-friendly Automatic Transcription of Low-resource Languages: Plugging ESPnet into Elpis
Oliver Adams | Benjamin Galliot | Guillaume Wisniewski | Nicholas Lambourne | Ben Foley | Rahasya Sanders-Dwyer | Janet Wiles | Alexis Michaud | Séverine Guillaume | Laurent Besacier | Christopher Cox | Katya Aplonova | Guillaume Jacques | Nathan Hill
Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)

2020

pdf
The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software
Roland Kuhn | Fineen Davis | Alain Désilets | Eric Joanis | Anna Kazantseva | Rebecca Knowles | Patrick Littell | Delaney Lothian | Aidan Pine | Caroline Running Wolf | Eddie Santos | Darlene Stewart | Gilles Boulianne | Vishwa Gupta | Brian Maracle Owennatékha | Akwiratékha’ Martin | Christopher Cox | Marie-Odile Junker | Olivia Sammons | Delasie Torkornoo | Nathan Thanyehténhas Brinklow | Sara Child | Benoît Farley | David Huggins-Daines | Daisy Rosenblum | Heather Souter
Proceedings of the 28th International Conference on Computational Linguistics

This paper surveys the first, three-year phase of a project at the National Research Council of Canada that is developing software to assist Indigenous communities in Canada in preserving their languages and extending their use. The project aimed to work within the empowerment paradigm, where collaboration with communities and fulfillment of their goals is central. Since many of the technologies we developed were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (such as Kanyen’kéha, in the Iroquoian language family), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for recordings of speech in Indigenous languages (and other languages), software for implementing text prediction and read-along audiobooks for Indigenous languages, and several other subprojects.

pdf
Endangered Languages meet Modern NLP
Antonios Anastasopoulos | Christopher Cox | Graham Neubig | Hilaria Cruz
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

This tutorial will focus on NLP for endangered languages documentation and revitalization. First, we will acquaint the attendees with the process and the challenges of language documentation, showing how the needs of the language communities and the documentary linguists map to specific NLP tasks. We will then present the state-of-the-art in NLP applied in this particularly challenging setting (extremely low-resource datasets, noisy transcriptions, limited annotations, non-standard orthographies). In doing so, we will also analyze the challenges of working in this domain and expand on both the capabilities and the limitations of current NLP approaches. Our ultimate goal is to motivate more NLP practitioners to work towards this very important direction, and also provide them with the tools and understanding of the limitations/challenges, both of which are needed in order to have an impact.

2018

pdf
Indigenous language technologies in Canada: Assessment, challenges, and successes
Patrick Littell | Anna Kazantseva | Roland Kuhn | Aidan Pine | Antti Arppe | Christopher Cox | Marie-Odile Junker
Proceedings of the 27th International Conference on Computational Linguistics

In this article, we discuss which text, speech, and image technologies have been developed, and would be feasible to develop, for the approximately 60 Indigenous languages spoken in Canada. In particular, we concentrate on technologies that may be feasible to develop for most or all of these languages, not just those that may be feasible for the few most-resourced of these. We assess past achievements and consider future horizons for Indigenous language transliteration, text prediction, spell-checking, approximate search, machine translation, speech recognition, speaker diarization, speech synthesis, optical character recognition, and computer-aided language learning.

pdf
A Computational Architecture for the Morphology of Upper Tanana
Olga Lovick | Christopher Cox | Miikka Silfverberg | Antti Arppe | Mans Hulden
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2004

pdf
Standards for Language Codes: developing ISO 639
David Dalby | Lee Gillam | Christopher Cox | Debbie Garside
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf
Knowledge Exchange and Terminology Interchange: The Role of Standards
Lee Gillam | Khurshid Ahmad | David Dalby | Christopher Cox
Proceedings of Translating and the Computer 24