At the Lower End of Language—Exploring the Vulgar and Obscene Side of German

Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn


Abstract
In this paper, we describe a workflow for the data-driven acquisition and semantic scaling of a lexicon that covers lexical items from the lower end of the German language register—terms typically considered as rough, vulgar or obscene. Since the fine semantic representation of grades of obscenity can only inadequately be captured at the categorical level (e.g., obscene vs. non-obscene, or rough vs. vulgar), our main contribution lies in applying best-worst scaling, a rating methodology that has already been shown to be useful for emotional language, to capture the relative strength of obscenity of lexical items. We describe the empirical foundations for bootstrapping such a low-end lexicon for German by starting from manually supplied lexicographic categorizations of a small seed set of rough and vulgar lexical items and automatically enlarging this set by means of distributional semantics. We then determine the degrees of obscenity for the full set of all acquired lexical items by letting crowdworkers comparatively assess their pejorative grade using best-worst scaling. This semi-automatically enriched lexicon already comprises 3,300 lexical items and incorporates 33,000 vulgarity ratings. Using it as a seed lexicon for fully automatic lexical acquisition, we were able to raise its coverage up to slightly more than 11,000 entries.
Anthology ID:
W19-3513
Volume:
Proceedings of the Third Workshop on Abusive Language Online
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
ALW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–128
Language:
URL:
https://aclanthology.org/W19-3513
DOI:
10.18653/v1/W19-3513
Bibkey:
Cite (ACL):
Elisabeth Eder, Ulrike Krieg-Holz, and Udo Hahn. 2019. At the Lower End of Language—Exploring the Vulgar and Obscene Side of German. In Proceedings of the Third Workshop on Abusive Language Online, pages 119–128, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
At the Lower End of Language—Exploring the Vulgar and Obscene Side of German (Eder et al., ALW 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/starsem-semeval-split/W19-3513.pdf