Jacqueline Brixey

2026

Can code-switching improve the user experience with a dialogue system app for recording endangered languages?
Jacqueline Brixey | David Traum
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology

This paper investigates whether a multilingual spoken dialogue system can be used to help collect and preserve endangered language data. In this work, we extend DAPEL (Dialogue APp for Endangered Languages), which is designed to help preserve any language. Our focus, for testing purposes, is on the American Indigenous language Choctaw. The system uses English as a common language, and we test whether incorporating code-switching—the act of alternating between languages—enhances the user experience and/or increases the amount of recorded language data. Our results indicate that users have a positive response to interacting in both languages with the system, that the system plays a meaningful role in language documentation, and, notably, that participants who speak Choctaw as their first language are more receptive to a code-switching system than to a monolingual English-based system.

pdf bib abs

IndigiEval: Evaluating LLMs in North American Indigenous Languages
Julia Mainzinger | Jacqueline Brixey
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

This paper presents IndigiEval, a framework for evaluating the language and cultural proficiency of several commercially available large language models (LLMs) across five North American Indigenous languages (Mvskoke, Choctaw, Cherokee, Cheyenne, and Hawaiian). This framework is a qualitative evaluation method intended for communities with small speaker populations to be able to critically evaluate LLM performance with minimal data and human effort. IndigiEval includes tasks such as answering cultural questions, translation, text generation, and speech recognition. The results of our experiments indicate that no currently available LLM performs well across all evaluation categories, and that LLMs frequently hallucinate orthographies, grammatical structures, cultural knowledge, and vocabulary for all languages and cultures considered. Our proposed evaluation framework is not intended as a comprehensive score, but rather a qualitative and flexible framework to inform language communities about a given LLM’s potential as a resource, since each language has unique environments, strengths, and availability of resources.

pdf bib abs

Towards a Community-accessible Cahuilla corpus: Developing HTR for J.P. Harrington’s handwritten fieldnotes on Mountain Cahuilla
Ray Huaute | Jacqueline Brixey
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

This paper describes ongoing work to develop a corpus of Cahuilla language from the John Peabody Harrington collection, which contains linguistic and ethnographic fieldnotes documenting Indigenous languages of California and other regions across the Americas. Handwritten notes present numerous processing challenges, including scratch-outs, multilingual entries in Spanish and other Indigenous languages, unique abbreviations, and varying script orientations. We compare the efficacy of deep learning text recognition models to convert images of the notes into a machine-readable format, with a focus on respecting tribal data sovereignty in our methods. We find that Pylaia is the most accurate model for our data. Finally, we present the preliminary findings and indicate future directions for developing a Cahuilla corpus.

2025

pdf bib abs

Does a code-switching dialogue system help users learn conversational fluency in Choctaw?
Jacqueline Brixey | David Traum
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

We investigate the learning outcomes and user response to a chatbot for practicing conversational Choctaw, an endangered American Indigenous language. Conversational fluency is a goal for many language learners, however, for learners of endangered languages in North America, access to fluent speakers may be limited. Chatbots are potentially ideal dialogue partners as this kind of dialogue system fulfills a non-authoritative role by focusing on carrying on a conversation as an equal conversational partner. The goal of the chatbot investigated in this work is to serve as a conversational partner in the absence of a fluent Choctaw-speaking human interlocutor. We investigate the impact of code-switching in the interaction, comparing a bilingual chatbot against a monolingual Choctaw version. We evaluate the systems for user engagement and enjoyment, as well as gains in conversational fluency from interacting with the system.

2020

pdf bib abs

Exploring a Choctaw Language Corpus with Word Vectors and Minimum Distance Length
Jacqueline Brixey | David Sides | Timothy Vizthum | David Traum | Khalil Iskarous
Proceedings of the Twelfth Language Resources and Evaluation Conference

This work introduces additions to the corpus ChoCo, a multimodal corpus for the American indigenous language Choctaw. Using texts from the corpus, we develop new computational resources by using two off-the-shelf tools: word2vec and Linguistica. Our work illustrates how these tools can be successfully implemented with a small corpus.

2018

pdf bib

pdf bib

Edit me: A Corpus and a Framework for Understanding Natural Language Image Editing
Ramesh Manuvinakurike | Jacqueline Brixey | Trung Bui | Walter Chang | Doo Soon Kim | Ron Artstein | Kallirroi Georgila
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

Chahta Anumpa: A multimodal corpus of the Choctaw Language
Jacqueline Brixey | Eli Pincus | Ron Artstein
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs

We present the implementation of an autonomous chatbot, SHIHbot, deployed on Facebook, which answers a wide variety of sexual health questions on HIV/AIDS. The chatbot’s response database is com-piled from professional medical and public health resources in order to provide reliable information to users. The system’s backend is NPCEditor, a response selection platform trained on linked questions and answers; to our knowledge this is the first retrieval-based chatbot deployed on a large public social network.