Mirko Sommer


2026

Moralizations – arguments that invoke moral values to justify demands or positions – are a yet underexplored form of persuasive communication. We present the Moralization Corpus, a novel multi-genre dataset designed to analyze how moral values are strategically used in argumentative discourse. Moralizations are pragmatically complex and often implicit, posing significant challenges for both human annotators and NLP systems. We develop a frame-based annotation scheme that captures the constitutive elements of moralizations – moral values, demands, and discourse protagonists – and apply it to a diverse set of German texts, including political debates, news articles, and online discussions. The corpus enables fine-grained analysis of moralizing language across communicative formats and domains. We further evaluate several large language models (LLMs) under varied prompting conditions for the task of moralization detection and moralization component extraction and compare it to human annotations in order to investigate the challenges of automatic and manual analysis of moralizations. Results show that detailed prompt instructions have a greater effect than few-shot or explanation-based prompting, and that moralization remains a highly subjective and context-sensitive task. We release all data, annotation guidelines, and code to foster future interdisciplinary research on moral discourse and moral reasoning in NLP.
Protagonists play a central role in moral discourse by structuring responsibility and authority, yet computational work has largely focused on moral values rather than the actors involved. We address this gap by studying phrase-level protagonist detection and classification in the Moralization Corpus (Becker et al., 2025), a dataset of moral arguments across different text genres. We decompose the task into identifying protagonist mentions and classifying them by what kind of actor they are (e.g., individual or institution) and what function they serve in the moral argument.We compare fine-tuned lightweight models, state-of-the-art NER models, and prompting-based large language models. We further establish human baselines and analyze the impact of contextual information on human and model decisions. Our results show that fine-tuned NER models achieve competitive detection performance at substantially lower cost than prompted large language models, and that role classification benefits more strongly from contextualized prompting. Across tasks, top-performing models reach or exceed human-level performance, highlighting the value of task decomposition for modeling protagonists in moral discourse.We release our code, predictions, and supplementary material in our project repository.