Connecting Automated Speech Recognition to Transcription Practices

Blaine Billings; Bradley McDonnell

Connecting Automated Speech Recognition to Transcription Practices

Abstract

One of the greatest issues facing documentary linguists is the transcription bottleneck. While the large quantity of audio and video data gener- ated as part of a documentary project serves as a long-lasting record of the language, without corresponding text transcriptions, it remains largely inaccessible for revitalization efforts and linguistic analysis. Automated Speech Recognition (ASR) is frequently proposed as the solution to this problem. However, two is- sues often prevent documentary linguists from making use of ASR models 1) the thought that the typical documentary project does not have sufficient data to develop an adequate ASR model and 2) that correcting the output of an ASR model would be more time-consuming for transcribers than simply creating a transcription from scratch. In this paper, we tackle both of these issues by developing an ASR model in the larger context of a documentation project for Nasal, a low-resource language of western Indonesia. Fine-tuning a larger pre-trained lan- guage model on 25 hours of transcribed Nasal speech, we produce a model that has a 44% word error rate. Despite this relatively high error rate, tests comparing speed of transcrib- ing from scratch and correcting ASR-generated transcripts show that the ASR model can sig- nificantly speed up the transcription process.

Anthology ID:: 2025.computel-main.14
Volume:: Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:: March
Year:: 2025
Address:: Honolulu, Hawaii, USA
Editors:: Jordan Lachler, Godfred Agyapong, Antti Arppe, Sarah Moeller, Aditi Chaudhary, Shruti Rijhwani, Daisy Rosenblum
Venues:: ComputEL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 128–132
Language:
URL:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.computel-main.14/
DOI:
Bibkey:
Cite (ACL):: Blaine Billings and Bradley McDonnell. 2025. Connecting Automated Speech Recognition to Transcription Practices. In Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 128–132, Honolulu, Hawaii, USA. Association for Computational Linguistics.
Cite (Informal):: Connecting Automated Speech Recognition to Transcription Practices (Billings & McDonnell, ComputEL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.computel-main.14.pdf

PDF Cite Search Fix data