Jacqueline Brixey
2026
Can code-switching improve the user experience with a dialogue system app for recording endangered languages?
Jacqueline Brixey | David Traum
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Jacqueline Brixey | David Traum
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
This paper investigates whether a multilingual spoken dialogue system can be used to help collect and preserve endangered language data. In this work, we extend DAPEL (Dialogue APp for Endangered Languages), which is designed to help preserve any language. Our focus, for testing purposes, is on the American Indigenous language Choctaw. The system uses English as a common language, and we test whether incorporating code-switching—the act of alternating between languages—enhances the user experience and/or increases the amount of recorded language data. Our results indicate that users have a positive response to interacting in both languages with the system, that the system plays a meaningful role in language documentation, and, notably, that participants who speak Choctaw as their first language are more receptive to a code-switching system than to a monolingual English-based system.
IndigiEval: Evaluating LLMs in North American Indigenous Languages
Julia Mainzinger | Jacqueline Brixey
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Julia Mainzinger | Jacqueline Brixey
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
This paper presents IndigiEval, a framework for evaluating the language and cultural proficiency of several commercially available large language models (LLMs) across five North American Indigenous languages (Mvskoke, Choctaw, Cherokee, Cheyenne, and Hawaiian). This framework is a qualitative evaluation method intended for communities with small speaker populations to be able to critically evaluate LLM performance with minimal data and human effort. IndigiEval includes tasks such as answering cultural questions, translation, text generation, and speech recognition. The results of our experiments indicate that no currently available LLM performs well across all evaluation categories, and that LLMs frequently hallucinate orthographies, grammatical structures, cultural knowledge, and vocabulary for all languages and cultures considered. Our proposed evaluation framework is not intended as a comprehensive score, but rather a qualitative and flexible framework to inform language communities about a given LLM’s potential as a resource, since each language has unique environments, strengths, and availability of resources.
Towards a Community-accessible Cahuilla corpus: Developing HTR for J.P. Harrington’s handwritten fieldnotes on Mountain Cahuilla
Ray Huaute | Jacqueline Brixey
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Ray Huaute | Jacqueline Brixey
Proceedings of the Sixth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
This paper describes ongoing work to develop a corpus of Cahuilla language from the John Peabody Harrington collection, which contains linguistic and ethnographic fieldnotes documenting Indigenous languages of California and other regions across the Americas. Handwritten notes present numerous processing challenges, including scratch-outs, multilingual entries in Spanish and other Indigenous languages, unique abbreviations, and varying script orientations. We compare the efficacy of deep learning text recognition models to convert images of the notes into a machine-readable format, with a focus on respecting tribal data sovereignty in our methods. We find that Pylaia is the most accurate model for our data. Finally, we present the preliminary findings and indicate future directions for developing a Cahuilla corpus.
2025
Does a code-switching dialogue system help users learn conversational fluency in Choctaw?
Jacqueline Brixey | David Traum
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Jacqueline Brixey | David Traum
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
We investigate the learning outcomes and user response to a chatbot for practicing conversational Choctaw, an endangered American Indigenous language. Conversational fluency is a goal for many language learners, however, for learners of endangered languages in North America, access to fluent speakers may be limited. Chatbots are potentially ideal dialogue partners as this kind of dialogue system fulfills a non-authoritative role by focusing on carrying on a conversation as an equal conversational partner. The goal of the chatbot investigated in this work is to serve as a conversational partner in the absence of a fluent Choctaw-speaking human interlocutor. We investigate the impact of code-switching in the interaction, comparing a bilingual chatbot against a monolingual Choctaw version. We evaluate the systems for user engagement and enjoyment, as well as gains in conversational fluency from interacting with the system.
2020
Exploring a Choctaw Language Corpus with Word Vectors and Minimum Distance Length
Jacqueline Brixey | David Sides | Timothy Vizthum | David Traum | Khalil Iskarous
Proceedings of the Twelfth Language Resources and Evaluation Conference
Jacqueline Brixey | David Sides | Timothy Vizthum | David Traum | Khalil Iskarous
Proceedings of the Twelfth Language Resources and Evaluation Conference
This work introduces additions to the corpus ChoCo, a multimodal corpus for the American indigenous language Choctaw. Using texts from the corpus, we develop new computational resources by using two off-the-shelf tools: word2vec and Linguistica. Our work illustrates how these tools can be successfully implemented with a small corpus.
2018
DialEdit: Annotations for Spoken Conversational Image Editing
Ramesh Manuvirakurike | Jacqueline Brixey | Trung Bui | Walter Chang | Ron Artstein | Kallirroi Georgila
Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation
Ramesh Manuvirakurike | Jacqueline Brixey | Trung Bui | Walter Chang | Ron Artstein | Kallirroi Georgila
Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation
Edit me: A Corpus and a Framework for Understanding Natural Language Image Editing
Ramesh Manuvinakurike | Jacqueline Brixey | Trung Bui | Walter Chang | Doo Soon Kim | Ron Artstein | Kallirroi Georgila
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Ramesh Manuvinakurike | Jacqueline Brixey | Trung Bui | Walter Chang | Doo Soon Kim | Ron Artstein | Kallirroi Georgila
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Chahta Anumpa: A multimodal corpus of the Choctaw Language
Jacqueline Brixey | Eli Pincus | Ron Artstein
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Jacqueline Brixey | Eli Pincus | Ron Artstein
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2017
SHIHbot: A Facebook chatbot for Sexual Health Information on HIV/AIDS
Jacqueline Brixey | Rens Hoegen | Wei Lan | Joshua Rusow | Karan Singla | Xusen Yin | Ron Artstein | Anton Leuski
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Jacqueline Brixey | Rens Hoegen | Wei Lan | Joshua Rusow | Karan Singla | Xusen Yin | Ron Artstein | Anton Leuski
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
We present the implementation of an autonomous chatbot, SHIHbot, deployed on Facebook, which answers a wide variety of sexual health questions on HIV/AIDS. The chatbot’s response database is com-piled from professional medical and public health resources in order to provide reliable information to users. The system’s backend is NPCEditor, a response selection platform trained on linked questions and answers; to our knowledge this is the first retrieval-based chatbot deployed on a large public social network.