Aspects of Selecting the Right ASR Training Languages for Under-Resourced Languages

J. Elizabeth Liebl, Summer Chambers, Matthew Kelley, Géraldine Walther


Abstract
We investigate how training languages should be selected for cross-lingual IPA ASR on unseen languages. Using Common Voice audio and Vox Communis phonetic transcripts, we train multilingual IPA-based ASR models for Upper Sorbian, Luganda, and Tatar under three linguistically motivated selection strategies: genealogical relatedness, geographic proximity, and phonological inventory overlap. We compare these strategies to a random baseline and evaluate performance with phone error rate. Linguistically informed selection generally improves transfer, but no single strategy is consistently optimal. Geographic proximity performs best for Luganda, phonological overlap is slightly best for Tatar, and none of the proposed strategies outperform random selection for Upper Sorbian. The results suggest that linguistic similarity aids low-resource ASR transfer, but that the most useful dimension of similarity varies by target language.
Anthology ID:
2026.computel-1.16
Volume:
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Godfred Agyapong, Sarah Moeller, Antti Arppe, Ali Marashian, Daisy Rosenblum
Venues:
ComputEL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
148–156
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.computel-1.16/
DOI:
Bibkey:
Cite (ACL):
J. Elizabeth Liebl, Summer Chambers, Matthew Kelley, and Géraldine Walther. 2026. Aspects of Selecting the Right ASR Training Languages for Under-Resourced Languages. In Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9), pages 148–156, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Aspects of Selecting the Right ASR Training Languages for Under-Resourced Languages (Liebl et al., ComputEL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.computel-1.16.pdf
Supplementarymaterial:
 2026.computel-1.16.SupplementaryMaterial.txt