Zara Maxwelll-smith


2023

pdf bib
Automated speech recognition of Indonesian-English language lessons on YouTube using transfer learning
Zara Maxwelll-smith | Ben Foley
Proceedings of the Second Workshop on NLP Applications to Field Linguistics

Experiments to fine-tune large multilingual models with limited data from a specific domain or setting has potential to improve automatic speech recognition (ASR) outcomes. This paper reports on the use of the Elpis ASR pipeline to fine-tune two pre-trained base models, Wav2Vec2-XLSR-53 and Wav2Vec2-Large-XLSR-Indonesian, with various mixes of data from 3 YouTube channels teaching Indonesian with English as the language of instruction. We discuss our results inferring new lesson audio (22-46% word error rate) in the context of speeding data collection in diverse and specialised settings. This study is an example of how ASR can be used to accelerate natural language research, expanding ethically sourced data in low-resource settings.
Search
Co-authors