Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models

Jonathan David Mutal, Pierrette Bouillon, Johanna Gerlach, Marianne Starlander


Abstract
Multilingual pre-trained language models are often the best alternative in low-resource settings. In the context of a cascade architecture for automatic Standard German captioning of spoken Swiss German, we evaluate different models on the task of transforming normalised Swiss German ASR output into Standard German. Instead of training a large model from scratch, we fine-tuned publicly available pre-trained models, which reduces the cost of training high-quality neural machine translation models. Results show that pre-trained multilingual models achieve the highest scores, and that a higher number of languages included in pre-training improves the performance. We also observed that the type of source and target included in fine-tuning data impacts the results.
Anthology ID:
2023.mtsummit-users.6
Volume:
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track
Month:
September
Year:
2023
Address:
Macau SAR, China
Editors:
Masaru Yamada, Felix do Carmo
Venue:
MTSummit
SIG:
Publisher:
Asia-Pacific Association for Machine Translation
Note:
Pages:
65–76
Language:
URL:
https://aclanthology.org/2023.mtsummit-users.6
DOI:
Bibkey:
Cite (ACL):
Jonathan David Mutal, Pierrette Bouillon, Johanna Gerlach, and Marianne Starlander. 2023. Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models. In Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track, pages 65–76, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Cite (Informal):
Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models (Mutal et al., MTSummit 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2023.mtsummit-users.6.pdf