Truecasing German user-generated conversational text

Yulia Grishina; Thomas Gueudre; Ralf Winkler

doi:10.18653/v1/2020.wnut-1.19

Truecasing German user-generated conversational text

Yulia Grishina, Thomas Gueudre, Ralf Winkler

Abstract

True-casing, the task of restoring proper case to (generally) lower case input, is important in downstream tasks and for screen display. In this paper, we investigate truecasing as an in- trinsic task and present several experiments on noisy user queries to a voice-controlled dia- log system. In particular, we compare a rule- based, an n-gram language model (LM) and a recurrent neural network (RNN) approaches, evaluating the results on a German Q&A cor- pus and reporting accuracy for different case categories. We show that while RNNs reach higher accuracy especially on large datasets, character n-gram models with interpolation are still competitive, in particular on mixed- case words where their fall-back mechanisms come into play.

Anthology ID:: 2020.wnut-1.19
Volume:: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:: November
Year:: 2020
Address:: Online
Editors:: Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:: WNUT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 143–148
Language:
URL:: https://aclanthology.org/2020.wnut-1.19
DOI:: 10.18653/v1/2020.wnut-1.19
Bibkey:
Cite (ACL):: Yulia Grishina, Thomas Gueudre, and Ralf Winkler. 2020. Truecasing German user-generated conversational text. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 143–148, Online. Association for Computational Linguistics.
Cite (Informal):: Truecasing German user-generated conversational text (Grishina et al., WNUT 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2020.wnut-1.19.pdf

PDF Search