Automatic Phone Alignment of Code-switched Urum–Russian Field Data

Emily Ahn; Eleanor Chodroff; Gina-Anne Levow

Automatic Phone Alignment of Code-switched Urum–Russian Field Data

Emily Ahn, Eleanor Chodroff, Gina-Anne Levow

Abstract

Code-switching, using multiple languages in a single utterance, is a common means of communication.In the language documentation process, speakers may code-switch between the target language and a language of broader communication; however, how to handle this mixed speech data is not always clearly addressed for speech research and specifically for a corpus phonetics pipeline.This paper investigates best practices for conducting phone-level forced alignment of code-switched field data using the Urum speech dataset from DoReCo. This dataset comprises 117 minutes of narrative utterances, of which 42% contain code-switched Urum–Russian speech.We demonstrate that the inclusion of Russian speech and Russian pretrained acoustic models can aid the alignment of Urum phones.Beyond using boundary alignment precision and accuracy metrics, we also discovered that the method of acoustic modeling impacted a downstream corpus phonetics investigation of code-switched Urum–Russian.

Anthology ID:: 2025.fieldmatters-1.1
Volume:: Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Éric Le Ferrand, Elena Klyachko, Anna Postnikova, Tatiana Shavrina, Oleg Serikov, Ekaterina Voloshina, Ekaterina Vylomova
Venues:: FieldMatters | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–14
Language:
URL:: https://preview.aclanthology.org/corrections-2025-08/2025.fieldmatters-1.1/
DOI:
Bibkey:
Cite (ACL):: Emily Ahn, Eleanor Chodroff, and Gina-Anne Levow. 2025. Automatic Phone Alignment of Code-switched Urum–Russian Field Data. In Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics, pages 1–14, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Automatic Phone Alignment of Code-switched Urum–Russian Field Data (Ahn et al., FieldMatters 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-08/2025.fieldmatters-1.1.pdf

PDF Cite Search Fix data