Conversational Speech Recognition Needs Data? Experiments with Austrian German
Julian Linke, Philip N. Garner, Gernot Kubin, Barbara Schuppler
Abstract
Conversational speech represents one of the most complex of automatic speech recognition (ASR) tasks owing to the high inter-speaker variation in both pronunciation and conversational dynamics. Such complexity is particularly sensitive to low-resourced (LR) scenarios. Recent developments in self-supervision have allowed such scenarios to take advantage of large amounts of otherwise unrelated data. In this study, we characterise an (LR) Austrian German conversational task. We begin with a non-pre-trained baseline and show that fine-tuning of a model pre-trained using self-supervision leads to improvements consistent with those in the literature; this extends to cases where a lexicon and language model are included. We also show that the advantage of pre-training indeed arises from the larger database rather than the self-supervision. Further, by use of a leave-one-conversation out technique, we demonstrate that robustness problems remain with respect to inter-speaker and inter-conversation variation. This serves to guide where future research might best be focused in light of the current state-of-the-art.- Anthology ID:
- 2022.lrec-1.500
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4684–4691
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.500
- DOI:
- Cite (ACL):
- Julian Linke, Philip N. Garner, Gernot Kubin, and Barbara Schuppler. 2022. Conversational Speech Recognition Needs Data? Experiments with Austrian German. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4684–4691, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Conversational Speech Recognition Needs Data? Experiments with Austrian German (Linke et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2022.lrec-1.500.pdf