We present results of a project on emotion classification on historical German plays of Enlightenment, Storm and Stress, and German Classicism. We have developed a hierarchical annotation scheme consisting of 13 sub-emotions like suffering, love and joy that sum up to 6 main and 2 polarity classes (positive/negative). We have conducted textual annotations on 11 German plays and have acquired over 13,000 emotion annotations by two annotators per play. We have evaluated multiple traditional machine learning approaches as well as transformer-based models pretrained on historical and contemporary language for a single-label text sequence emotion classification for the different emotion categories. The evaluation is carried out on three different instances of the corpus: (1) taking all annotations, (2) filtering overlapping annotations by annotators, (3) applying a heuristic for speech-based analysis. Best results are achieved on the filtered corpus with the best models being large transformer-based models pretrained on contemporary German language. For the polarity classification accuracies of up to 90% are achieved. The accuracies become lower for settings with a higher number of classes, achieving 66% for 13 sub-emotions. Further pretraining of a historical model with a corpus of dramatic texts led to no improvements.
Data acquisition in dialectology is typically a tedious task, as dialect samples of spoken language have to be collected via questionnaires or interviews. In this article, we suggest to use the “web as a corpus” approach for dialectology. We present a case study that demonstrates how authentic language data for the Bavarian dialect (ISO 639-3:bar) can be collected automatically from the social network Facebook. We also show that Facebook can be used effectively as a crowdsourcing platform, where users are willing to translate dialect words collaboratively in order to create a common lexicon of their Bavarian dialect. Key insights from the case study are summarized as “lessons learned”, together with suggestions for future enhancements of the lexicon creation approach.