Data Augmentation for the Post-Stroke Speech Transcription (PSST) Challenge: Sometimes Less Is More

Jiahong Yuan; Xingyu Cai; Kenneth Church

Data Augmentation for the Post-Stroke Speech Transcription (PSST) Challenge: Sometimes Less Is More

Jiahong Yuan, Xingyu Cai, Kenneth Church

Abstract

We employ the method of fine-tuning wav2vec2.0 for recognition of phonemes in aphasic speech. Our effort focuses on data augmentation, by supplementing data from both in-domain and out-of-domain datasets for training. We found that although a modest amount of out-of-domain data may be helpful, the performance of the model degrades significantly when the amount of out-of-domain data is much larger than in-domain data. Our hypothesis is that fine-tuning wav2vec2.0 with a CTC loss not only learns bottom-up acoustic properties but also top-down constraints. Therefore, out-of-domain data augmentation is likely to degrade performance if there is a language model mismatch between “in” and “out” domains. For in-domain audio without ground truth labels, we found that it is beneficial to exclude samples with less confident pseudo labels. Our final model achieves 16.7% PER (phoneme error rate) on the validation set, without using a language model for decoding. The result represents a relative error reduction of 14% over the baseline model trained without data augmentation. Finally, we found that “canonicalized” phonemes are much easier to recognize than manually transcribed phonemes.

Anthology ID:: 2022.rapid-1.9
Volume:: Proceedings of the RaPID Workshop - Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments - within the 13th Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Dimitrios Kokkinakis, Charalambos K. Themistocleous, Kristina Lundholm Fors, Athanasios Tsanas, Kathleen C. Fraser
Venue:: RaPID
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 71–79
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.rapid-1.9/
DOI:
Bibkey:
Cite (ACL):: Jiahong Yuan, Xingyu Cai, and Kenneth Church. 2022. Data Augmentation for the Post-Stroke Speech Transcription (PSST) Challenge: Sometimes Less Is More. In Proceedings of the RaPID Workshop - Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments - within the 13th Language Resources and Evaluation Conference, pages 71–79, Marseille, France. European Language Resources Association.
Cite (Informal):: Data Augmentation for the Post-Stroke Speech Transcription (PSST) Challenge: Sometimes Less Is More (Yuan et al., RaPID 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.rapid-1.9.pdf
Data: LibriSpeech

PDF Cite Search Fix data