Abstract
In this study we compare two approaches (neural machine translation and edit-based) and the use of synthetic data for the task of translating normalised Swiss German ASR output into correct written Standard German for subtitles, with a special focus on syntactic differences. Results suggest that NMT is better suited to this task and that relatively simple rule-based generation of training data could be a valuable approach for cases where little training data is available and transformations are simple.- Anthology ID:
- 2022.slpat-1.5
- Volume:
- Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Sarah Ebling, Emily Prud’hommeaux, Preethi Vaidyanathan
- Venue:
- SLPAT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 37–43
- Language:
- URL:
- https://aclanthology.org/2022.slpat-1.5
- DOI:
- 10.18653/v1/2022.slpat-1.5
- Cite (ACL):
- Johanna Gerlach, Jonathan Mutal, and Bouillon Pierrette. 2022. Producing Standard German Subtitles for Swiss German TV Content. In Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), pages 37–43, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Producing Standard German Subtitles for Swiss German TV Content (Gerlach et al., SLPAT 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.slpat-1.5.pdf