Eyal Dolev


2024

pdf
Does Whisper Understand Swiss German? An Automatic, Qualitative, and Human Evaluation
Eyal Dolev | Clemens Lutz | Noëmi Aepli
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)

Whisper is a state-of-the-art automatic speech recognition (ASR) model (Radford et al., 2022). Although Swiss German dialects are allegedly not part of Whisper’s training data, preliminary experiments showed Whisper can transcribe Swiss German quite well, with the output being a speech translation into Standard German. To gain a better understanding of Whisper’s performance on Swiss German, we systematically evaluate it using automatic, qualitative, and human evaluation. We test its performance on three existing test sets: SwissDial (Dogan-Schönberger et al., 2021), STT4SG-350 (Plüss et al., 2023), and Swiss Parliaments Corpus (Plüss et al., 2021). In addition, we create a new test set for this study based on short mock clinical interviews. To automatically evaluate performance, we used word error rate (WER) and BLEU. We also conducted a qualitative analysis of Whisper’s performance, discussing its strengths and weaknesses. Finally, 28 people participated in a survey evaluating Whisper’s performance. All of our evaluations showed that Whisper is a viable ASR system for Swiss German, so long as the Standard German output is desired.