A Swiss German Dictionary: Variation in Speech and Writing

Larissa Schmidt, Lucy Linder, Sandra Djambazovska, Alexandros Lazaridis, Tanja Samardžić, Claudiu Musat


Abstract
We introduce a dictionary containing normalized forms of common words in various Swiss German dialects into High German. As Swiss German is, for now, a predominantly spoken language, there is a significant variation in the written forms, even between speakers of the same dialect. To alleviate the uncertainty associated with this diversity, we complement the pairs of Swiss German - High German words with the Swiss German phonetic transcriptions (SAMPA). This dictionary becomes thus the first resource to combine large-scale spontaneous translation with phonetic transcriptions. Moreover, we control for the regional distribution and insure the equal representation of the major Swiss dialects. The coupling of the phonetic and written Swiss German forms is powerful. We show that they are sufficient to train a Transformer-based phoneme to grapheme model that generates credible novel Swiss German writings. In addition, we show that the inverse mapping - from graphemes to phonemes - can be modeled with a transformer trained with the novel dictionary. This generation of pronunciations for previously unknown words is key in training extensible automated speech recognition (ASR) systems, which are key beneficiaries of this dictionary.
Anthology ID:
2020.lrec-1.331
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2720–2725
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.331
DOI:
Bibkey:
Cite (ACL):
Larissa Schmidt, Lucy Linder, Sandra Djambazovska, Alexandros Lazaridis, Tanja Samardžić, and Claudiu Musat. 2020. A Swiss German Dictionary: Variation in Speech and Writing. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2720–2725, Marseille, France. European Language Resources Association.
Cite (Informal):
A Swiss German Dictionary: Variation in Speech and Writing (Schmidt et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.lrec-1.331.pdf