J. Elizabeth Liebl

2026

Aspects of Selecting the Right ASR Training Languages for Under-Resourced Languages
J. Elizabeth Liebl | Summer Chambers | Matthew Kelley | Géraldine Walther
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)

We investigate how training languages should be selected for cross-lingual IPA ASR on unseen languages. Using Common Voice audio and Vox Communis phonetic transcripts, we train multilingual IPA-based ASR models for Upper Sorbian, Luganda, and Tatar under three linguistically motivated selection strategies: genealogical relatedness, geographic proximity, and phonological inventory overlap. We compare these strategies to a random baseline and evaluate performance with phone error rate. Linguistically informed selection generally improves transfer, but no single strategy is consistently optimal. Geographic proximity performs best for Luganda, phonological overlap is slightly best for Tatar, and none of the proposed strategies outperform random selection for Upper Sorbian. The results suggest that linguistic similarity aids low-resource ASR transfer, but that the most useful dimension of similarity varies by target language.

2025

pdf bib abs

Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations
Poorvi Acharya | J. Elizabeth Liebl | Dhiman Goswami | Kai North | Marcos Zampieri | Antonios Anastasopoulos
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Language transfer is an important topic of research in second language acquisition and computational linguistics. The availability of suitable learner corpora is paramount for the study of second language acquisition (SLA) and language transfer. However, curating learner corpora is a challenging endeavor as high quality learner data is rarely publicly available. This results in only a few such corpora available to the community. To address this important gap, in this paper we present LENS, a novel English learner corpus with longitudinal data which enables researchers to investigate language learning over time. LENS contains 687 instances written by speakers of 15 different L1s. We use LENS two perform two important tasks at the intersection of SLA and Computational Linguistics: (1) Native Language Identification (NLI); and (2) an evaluation of large language models as a tool for high-precision, semi-automated annotation of L1 interference features.

Co-authors

Kai North 1

Géraldine Walther 1

Marcos Zampieri 1

Venues

Fix author