On the Tolerance of Repetition Before Performance Degradation in Kiswahili Automatic Speech Recognition

Kathleen Siminyu, Kathy Reid, Ryakitimboruby@gmail.com Ryakitimboruby@gmail.com, Bmwasaru@gmail.com Bmwasaru@gmail.com, Chenai@chenai.africa Chenai@chenai.africa


Abstract
State of the art end-to-end automatic speech recognition (ASR) models require large speech datasets for training. The Mozilla Common Voice project crowd-sources read speech to address this need. However, this approach often results in many audio utterances being recorded for each written sentence. Using Kiswahili speech data, this paper first explores how much audio repetition in utterances is permissible in a training set before model degradation occurs, then examines the extent to which audio augmentation techniques can be employed to increase the diversity of speech characteristics and improve accuracy. We find that repetition up to a ratio of 1 sentence to 8 audio recordings improves performance, but performance degrades at a ratio of 1:16. We also find small improvements from frequency mask, time mask and tempo augmentation. Our findings provide guidance on training set construction for ASR practitioners, particularly those working in under-served languages.
Anthology ID:
2025.africanlp-1.3
Volume:
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Constantine Lignos, Idris Abdulmumin, David Adelani
Venues:
AfricaNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–23
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.africanlp-1.3/
DOI:
10.18653/v1/2025.africanlp-1.3
Bibkey:
Cite (ACL):
Kathleen Siminyu, Kathy Reid, Ryakitimboruby@gmail.com Ryakitimboruby@gmail.com, Bmwasaru@gmail.com Bmwasaru@gmail.com, and Chenai@chenai.africa Chenai@chenai.africa. 2025. On the Tolerance of Repetition Before Performance Degradation in Kiswahili Automatic Speech Recognition. In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), pages 15–23, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
On the Tolerance of Repetition Before Performance Degradation in Kiswahili Automatic Speech Recognition (Siminyu et al., AfricaNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.africanlp-1.3.pdf