EDAudio: Easy Data Augmentation for Dialectal Audio

Lea Fischbach, Akbar Karimi, Alfred Lameli, Lucie Flek


Abstract
We investigate lightweight and easily applicable data augmentation techniques for dialectal audio classification. We evaluate four main methods, namely shifting pitch, interval removal, background noise insertion and interval swap as well as several subvariants on recordings from 20 German dialects. Each main method is tested across multiple hyperparameter combinations, inlcuding augmentation length, coverage ratio and number of augmentations per original sample. Our results show that frequency-based techniques, particularly frequency masking, consistently yield performance improvements, while others such as time masking or speaker-based insertion can negatively affect the results. Our comparative analysis identifies which augmentations are most effective under realistic conditions, offering simple and efficient strategies to improve dialectal speech classification.
Anthology ID:
2025.ranlp-1.44
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
363–368
Language:
URL:
https://preview.aclanthology.org/corrections-2026-01/2025.ranlp-1.44/
DOI:
Bibkey:
Cite (ACL):
Lea Fischbach, Akbar Karimi, Alfred Lameli, and Lucie Flek. 2025. EDAudio: Easy Data Augmentation for Dialectal Audio. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 363–368, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
EDAudio: Easy Data Augmentation for Dialectal Audio (Fischbach et al., RANLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2026-01/2025.ranlp-1.44.pdf