Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages

Siyu Liang, Gina-Anne Levow


Abstract
The development of Automatic Speech Recognition (ASR) has yielded impressive results, but its use in linguistic fieldwork remains limited. Recordings collected in fieldwork contexts present unique challenges, including spontaneous speech, environmental noise, and severely constrained datasets from under-documented languages. In this paper, we benchmark the performance of two fine-tuned multilingual ASR models, MMS and XLS-R, on five typologically diverse low-resource languages with control of training data duration. Our findings show that MMS is best suited when extremely small amounts of training data are available, whereas XLS-R shows parity performance once training data exceed one hour. We provide linguistically grounded analysis for further provide insights towards practical guidelines for field linguists, highlighting reproducible ASR adaptation approaches to mitigate the transcription bottleneck in language documentation.
Anthology ID:
2025.fieldmatters-1.3
Volume:
Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Éric Le Ferrand, Elena Klyachko, Anna Postnikova, Tatiana Shavrina, Oleg Serikov, Ekaterina Voloshina, Ekaterina Vylomova
Venues:
FieldMatters | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–37
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.fieldmatters-1.3/
DOI:
Bibkey:
Cite (ACL):
Siyu Liang and Gina-Anne Levow. 2025. Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages. In Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics, pages 26–37, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages (Liang & Levow, FieldMatters 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.fieldmatters-1.3.pdf