Reihaneh Amooie


2026

Low-resource automatic speech recognition (ASR) is challenging due to a scarcity of annotated data. While synthetic data from text-to-speech (TTS) systems can augment ASR training, its efficacy for low-resource languages remains unclear. In this study, we investigate under which conditions TTS-based data augmentation is most effective for low-resource languages. Experiments on six low-resource languages in Common Voice show that synthetic data is most beneficial under extremely low-resource ASR conditions (i.e., less than one hour of available real speech data), or for languages with larger amounts of TTS data (i.e., more than 10 hours). Additionally, increasing the amount and diversity of synthetic data while keeping an appropriate ratio of synthetic-to-real data can further improve ASR performance.