Maximilian Heinrich

2024

While human values play a crucial role in making arguments persuasive, we currently lack the necessary extensive datasets to develop methods for analyzing the values underlying these arguments on a large scale. To address this gap, we present the Touché23-ValueEval dataset, an expansion of the Webis-ArgValues-22 dataset. We collected and annotated an additional 4780 new arguments, doubling the dataset’s size to 9324 arguments. These arguments were sourced from six diverse sources, covering religious texts, community discussions, free-text arguments, newspaper editorials, and political debates. Each argument is annotated by three crowdworkers for 54 human values, following the methodology established in the original dataset. The Touché23-ValueEval dataset was utilized in the SemEval 2023 Task 4. ValueEval: Identification of Human Values behind Arguments, where an ensemble of transformer models demonstrated state-of-the-art performance. Furthermore, our experiments show that a fine-tuned large language model, Llama-2-7B, achieves comparable results.

2023

pdf abs
SemEval-2023 Task 4: ValueEval: Identification of Human Values Behind Arguments
Johannes Kiesel | Milad Alshomary | Nailia Mirzakhmedova | Maximilian Heinrich | Nicolas Handke | Henning Wachsmuth | Benno Stein
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Argumentation is ubiquitous in natural language communication, from politics and media to everyday work and private life. Many arguments derive their persuasive power from human values, such as self-directed thought or tolerance, albeit often implicitly. These values are key to understanding the semantics of arguments, as they are generally accepted as justifications for why a particular option is ethically desirable. Can automated systems uncover the values on which an argument draws? To answer this question, 39 teams submitted runs to ValueEval’23. Using a multi-sourced dataset of over 9K arguments, the systems achieved F1-scores up to 0.87 (nature) and over 0.70 for three more of 20 universal value categories. However, many challenges remain, as evidenced by the low peak F1-score of 0.39 for stimulation, hedonism, face, and humility.

2022

pdf abs
Few-Shot Learning for Argument Aspects of the Nuclear Energy Debate
Lena Jurkschat | Gregor Wiedemann | Maximilian Heinrich | Mattes Ruckdeschel | Sunna Torge
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We approach aspect-based argument mining as a supervised machine learning task to classify arguments into semantically coherent groups referring to the same defined aspect categories. As an exemplary use case, we introduce the Argument Aspect Corpus - Nuclear Energy that separates arguments about the topic of nuclear energy into nine major aspects. Since the collection of training data for further aspects and topics is costly, we investigate the potential for current transformer-based few-shot learning approaches to accurately classify argument aspects. The best approach is applied to a British newspaper corpus covering the debate on nuclear energy over the past 21 years. Our evaluation shows that a stable prediction of shares of argument aspects in this debate is feasible with 50 to 100 training samples per aspect. Moreover, we see signals for a clear shift in the public discourse in favor of nuclear energy in recent years. This revelation of changing patterns of pro and contra arguments related to certain aspects over time demonstrates the potential of supervised argument aspect detection for tracking issue-specific media discourses.