Henning Wachsmuth

2022

pdf abs
Identifying the Human Values behind Arguments
Johannes Kiesel | Milad Alshomary | Nicolas Handke | Xiaoni Cai | Henning Wachsmuth | Benno Stein
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper studies the (often implicit) human values behind natural language arguments, such as to have freedom of thought or to be broadminded. Values are commonly accepted answers to why some option is desirable in the ethical sense and are thus essential both in real-world argumentation and theoretical argumentation frameworks. However, their large variety has been a major obstacle to modeling them in argument mining. To overcome this obstacle, we contribute an operationalization of human values, namely a multi-level taxonomy with 54 values that is in line with psychological research. Moreover, we provide a dataset of 5270 arguments from four geographical cultures, manually annotated for human values. First experiments with the automatic classification of human values are promising, with F₁-scores up to 0.81 and 0.25 on average.

pdf abs
The Moral Debater: A Study on the Computational Generation of Morally Framed Arguments
Milad Alshomary | Roxanne El Baff | Timon Gurcke | Henning Wachsmuth
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

An audience’s prior beliefs and morals are strong indicators of how likely they will be affected by a given argument. Utilizing such knowledge can help focus on shared values to bring disagreeing parties towards agreement. In argumentation technology, however, this is barely exploited so far. This paper studies the feasibility of automatically generating morally framed arguments as well as their effect on different audiences. Following the moral foundation theory, we propose a system that effectively generates arguments focusing on different morals. In an in-depth user study, we ask liberals and conservatives to evaluate the impact of these arguments. Our results suggest that, particularly when prior beliefs are challenged, an audience becomes more affected by morally framed arguments.

pdf abs
“Mama Always Had a Way of Explaining Things So I Could Understand”: A Dialogue Corpus for Learning to Construct Explanations
Henning Wachsmuth | Milad Alshomary
Proceedings of the 29th International Conference on Computational Linguistics

As AI is more and more pervasive in everyday life, humans have an increasing demand to understand its behavior and decisions. Most research on explainable AI builds on the premise that there is one ideal explanation to be found. In fact, however, everyday explanations are co-constructed in a dialogue between the person explaining (the explainer) and the specific person being explained to (the explainee). In this paper, we introduce a first corpus of dialogical explanations to enable NLP research on how humans explain as well as on how AI can learn to imitate this process. The corpus consists of 65 transcribed English dialogues from the Wired video series 5 Levels, explaining 13 topics to five explainees of different proficiency. All 1550 dialogue turns have been manually labeled by five independent professionals for the topic discussed as well as for the dialogue act and the explanation move performed. We analyze linguistic patterns of explainers and explainees, and we explore differences across proficiency levels. BERT-based baseline results indicate that sequence information helps predicting topics, acts, and moves effectively.

pdf abs
Analyzing Culture-Specific Argument Structures in Learner Essays
Wei-Fan Chen | Mei-Hua Chen | Garima Mudgal | Henning Wachsmuth
Proceedings of the 9th Workshop on Argument Mining

Language education has been shown to benefit from computational argumentation, for example, from methods that assess quality dimensions of language learners’ argumentative essays, such as their organization and argument strength. So far, however, little attention has been paid to cultural differences in learners’ argument structures originating from different origins and language capabilities. This paper extends prior studies of learner argumentation by analyzing differences in the argument structure of essays from culturally diverse learners. Based on the ICLE corpus containing essays written by English learners of 16 different mother tongues, we train natural language processing models to mine argumentative discourse units (ADUs) as well as to assess the essays’ quality in terms of organization and argument strength. The extracted ADUs and the predicted quality scores enable us to look into the similarities and differences of essay argumentation across different English learners. In particular, we analyze the ADUs from learners with different mother tongues, different levels of arguing proficiency, and different context cultures.

2021

pdf
Counter-Argument Generation by Attacking Weak Premises
Milad Alshomary | Shahbaz Syed | Arkajit Dhar | Martin Potthast | Henning Wachsmuth
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Generating Informative Conclusions for Argumentative Texts
Shahbaz Syed | Khalid Al Khatib | Milad Alshomary | Henning Wachsmuth | Martin Potthast
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf abs
Controlled Neural Sentence-Level Reframing of News Articles
Wei-Fan Chen | Khalid Al Khatib | Benno Stein | Henning Wachsmuth
Findings of the Association for Computational Linguistics: EMNLP 2021

Framing a news article means to portray the reported event from a specific perspective, e.g., from an economic or a health perspective. Reframing means to change this perspective. Depending on the audience or the submessage, reframing can become necessary to achieve the desired effect on the readers. Reframing is related to adapting style and sentiment, which can be tackled with neural text generation techniques. However, it is more challenging since changing a frame requires rewriting entire sentences rather than single phrases. In this paper, we study how to computationally reframe sentences in news articles while maintaining their coherence to the context. We treat reframing as a sentence-level fill-in-the-blank task for which we train neural models on an existing media frame corpus. To guide the training, we propose three strategies: framed-language pretraining, named-entity preservation, and adversarial learning. We evaluate respective models automatically and manually for topic consistency, coherence, and successful reframing. Our results indicate that generating properly-framed text works well but with tradeoffs.

pdf abs
Belief-based Generation of Argumentative Claims
Milad Alshomary | Wei-Fan Chen | Timon Gurcke | Henning Wachsmuth
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this work, we argue that augmenting argument generation technology with the ability to encode beliefs is of twofold. First, it gives more control on the generated arguments leading to better reach for audience. Second, it is one way of modeling the human process of synthesizing arguments. Therefore, we propose the task of belief-based claim generation, and study the research question of how to model and encode a user’s beliefs into a generated argumentative text. To this end, we model users’ beliefs via their stances on big issues, and extend state of the art text generation models with extra input reflecting user’s beliefs. Through an automatic evaluation we show empirical evidence of the applicability to encode beliefs into argumentative text. In our manual evaluation, we highlight that the low effectiveness of our approach stems from the noise produced by the automatic collection of bag-of-words, which was mitigated by removing this noise. The finding of this paper lays the ground work to further investigate the role of beliefs in generating better reaching arguments.

pdf abs
Learning From Revisions: Quality Assessment of Claims in Argumentation at Scale
Gabriella Skitalinskaya | Jonas Klaff | Henning Wachsmuth
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Assessing the quality of arguments and of the claims the arguments are composed of has become a key task in computational argumentation. However, even if different claims share the same stance on the same topic, their assessment depends on the prior perception and weighting of the different aspects of the topic being discussed. This renders it difficult to learn topic-independent quality indicators. In this paper, we study claim quality assessment irrespective of discussed aspects by comparing different revisions of the same claim. We compile a large-scale corpus with over 377k claim revision pairs of various types from kialo.com, covering diverse topics from politics, ethics, entertainment, and others. We then propose two tasks: (a) assessing which claim of a revision pair is better, and (b) ranking all versions of a claim by quality. Our first experiments with embedding-based logistic regression and transformer-based neural networks show promising results, suggesting that learned indicators generalize well across topics. In a detailed error analysis, we give insights into what quality dimensions of claims can be assessed reliably. We provide the data and scripts needed to reproduce all results.

pdf abs
Assessing the Sufficiency of Arguments through Conclusion Generation
Timon Gurcke | Milad Alshomary | Henning Wachsmuth
Proceedings of the 8th Workshop on Argument Mining

The premises of an argument give evidence or other reasons to support a conclusion. However, the amount of support required depends on the generality of a conclusion, the nature of the individual premises, and similar. An argument whose premises make its conclusion rationally worthy to be drawn is called sufficient in argument quality research. Previous work tackled sufficiency assessment as a standard text classification problem, not modeling the inherent relation of premises and conclusion. In this paper, we hypothesize that the conclusion of a sufficient argument can be generated from its premises. To study this hypothesis, we explore the potential of assessing sufficiency based on the output of large-scale pre-trained language models. Our best model variant achieves an F1-score of .885, outperforming the previous state-of-the-art and being on par with human experts. While manual evaluation reveals the quality of the generated conclusions, their impact remains low ultimately.

Key point analysis is the task of extracting a set of concise and high-level statements from a given collection of arguments, representing the gist of these arguments. This paper presents our proposed approach to the Key Point Analysis Shared Task, colocated with the 8th Workshop on Argument Mining. The approach integrates two complementary components. One component employs contrastive learning via a siamese neural network for matching arguments to key points; the other is a graph-based extractive summarization model for generating key points. In both automatic and manual evaluation, our approach was ranked best among all submissions to the shared task.

pdf abs
Syntopical Graphs for Computational Argumentation Tasks
Joe Barrow | Rajiv Jain | Nedim Lipka | Franck Dernoncourt | Vlad Morariu | Varun Manjunatha | Douglas Oard | Philip Resnik | Henning Wachsmuth
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Approaches to computational argumentation tasks such as stance detection and aspect detection have largely focused on the text of independent claims, losing out on potentially valuable context provided by the rest of the collection. We introduce a general approach to these tasks motivated by syntopical reading, a reading process that emphasizes comparing and contrasting viewpoints in order to improve topic understanding. To capture collection-level context, we introduce the syntopical graph, a data structure for linking claims within a collection. A syntopical graph is a typed multi-graph where nodes represent claims and edges represent different possible pairwise relationships, such as entailment, paraphrase, or support. Experiments applying syntopical graphs to the problems of detecting stance and aspects demonstrate state-of-the-art performance in each domain, significantly outperforming approaches that do not utilize collection-level information.

pdf abs
Employing Argumentation Knowledge Graphs for Neural Argument Generation
Khalid Al Khatib | Lukas Trautner | Henning Wachsmuth | Yufang Hou | Benno Stein
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Generating high-quality arguments, while being challenging, may benefit a wide range of downstream applications, such as writing assistants and argument search engines. Motivated by the effectiveness of utilizing knowledge graphs for supporting general text generation tasks, this paper investigates the usage of argumentation-related knowledge graphs to control the generation of arguments. In particular, we construct and populate three knowledge graphs, employing several compositions of them to encode various knowledge into texts of debate portals and relevant paragraphs from Wikipedia. Then, the texts with the encoded knowledge are used to fine-tune a pre-trained text generation model, GPT-2. We evaluate the newly created arguments manually and automatically, based on several dimensions important in argumentative contexts, including argumentativeness and plausibility. The results demonstrate the positive impact of encoding the graphs’ knowledge into debate portal texts for generating arguments with superior quality than those generated without knowledge.

2020

We propose a shared task on abstractive snippet generation for web pages, a novel task of generating query-biased abstractive summaries for documents that are to be shown on a search results page. Conventional snippets are extractive in nature, which recently gave rise to copyright claims from news publishers as well as a new copyright legislation being passed in the European Union, limiting the fair use of web page contents for snippets. At the same time, abstractive summarization has matured considerably in recent years, potentially allowing for more personalization of snippets in the future. Taken together, these facts render further research into generating abstractive snippets both timely and promising.

pdf abs
Analyzing the Persuasive Effect of Style in News Editorial Argumentation
Roxanne El Baff | Henning Wachsmuth | Khalid Al Khatib | Benno Stein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

News editorials argue about political issues in order to challenge or reinforce the stance of readers with different ideologies. Previous research has investigated such persuasive effects for argumentative content. In contrast, this paper studies how important the style of news editorials is to achieve persuasion. To this end, we first compare content- and style-oriented classifiers on editorials from the liberal NYTimes with ideology-specific effect annotations. We find that conservative readers are resistant to NYTimes style, but on liberals, style even has more impact than content. Focusing on liberals, we then cluster the leads, bodies, and endings of editorials, in order to learn about writing style patterns of effective argumentation.

pdf abs
Target Inference in Argument Conclusion Generation
Milad Alshomary | Shahbaz Syed | Martin Potthast | Henning Wachsmuth
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In argumentation, people state premises to reason towards a conclusion. The conclusion conveys a stance towards some target, such as a concept or statement. Often, the conclusion remains implicit, though, since it is self-evident in a discussion or left out for rhetorical reasons. However, the conclusion is key to understanding an argument and, hence, to any application that processes argumentation. We thus study the question to what extent an argument’s conclusion can be reconstructed from its premises. In particular, we argue here that a decisive step is to infer a conclusion’s target, and we hypothesize that this target is related to the premises’ targets. We develop two complementary target inference approaches: one ranks premise targets and selects the top-ranked target as the conclusion target, the other finds a new conclusion target in a learned embedding space using a triplet neural network. Our evaluation on corpora from two domains indicates that a hybrid of both approaches is best, outperforming several strong baselines. According to human annotators, we infer a reasonably adequate conclusion target in 89% of the cases.

pdf abs
Semi-Supervised Cleansing of Web Argument Corpora
Jonas Dorsch | Henning Wachsmuth
Proceedings of the 7th Workshop on Argument Mining

Debate portals and similar web platforms constitute one of the main text sources in computational argumentation research and its applications. While the corpora built upon these sources are rich of argumentatively relevant content and structure, they also include text that is irrelevant, or even detrimental, to their purpose. In this paper, we present a precision-oriented approach to detecting such irrelevant text in a semi-supervised way. Given a few seed examples, the approach automatically learns basic lexical patterns of relevance and irrelevance and then incrementally bootstraps new patterns from sentences matching the patterns. In the existing args.me corpus with 400k argumentative texts, our approach detects almost 87k irrelevant sentences, at a precision of 0.97 according to manual evaluation. With low effort, the approach can be adapted to other web argument corpora, providing a generic way to improve corpus quality.

pdf abs
Argument from Old Man’s View: Assessing Social Bias in Argumentation
Maximilian Spliethöver | Henning Wachsmuth
Proceedings of the 7th Workshop on Argument Mining

Social bias in language - towards genders, ethnicities, ages, and other social groups - poses a problem with ethical impact for many NLP applications. Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias. So far, however, little attention has been paid to bias in computational argumentation. In this paper, we study the existence of social biases in large English debate portals. In particular, we train word embedding models on portal-specific corpora and systematically evaluate their bias using WEAT, an existing metric to measure bias in word embeddings. In a word co-occurrence analysis, we then investigate causes of bias. The results suggest that all tested debate corpora contain unbalanced and biased data, mostly in favor of male people with European-American names. Our empirical insights contribute towards an understanding of bias in argumentative data sources.

pdf abs
Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity
Wei-Fan Chen | Khalid Al Khatib | Henning Wachsmuth | Benno Stein
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Media is an indispensable source of information and opinion, shaping the beliefs and attitudes of our society. Obviously, media portals can also provide overly biased content, e.g., by reporting on political events in a selective or incomplete manner. A relevant question hence is whether and how such a form of unfair news coverage can be exposed. This paper addresses the automatic detection of bias, but it goes one step further in that it explores how political bias and unfairness are manifested linguistically. We utilize a new corpus of 6964 news articles with labels derived from adfontesmedia.com to develop a neural model for bias assessment. Analyzing the model on article excerpts, we find insightful bias patterns at different levels of text granularity, from single words to the whole article discourse.

pdf abs
Mining Crowdsourcing Problems from Discussion Forums of Workers
Zahra Nouri | Henning Wachsmuth | Gregor Engels
Proceedings of the 28th International Conference on Computational Linguistics

Crowdsourcing is used in academia and industry to solve tasks that are easy for humans but hard for computers, in natural language processing mostly to annotate data. The quality of annotations is affected by problems in the task design, task operation, and task evaluation that workers face with requesters in crowdsourcing processes. To learn about the major problems, we provide a short but comprehensive survey based on two complementary studies: (1) a literature review where we collect and organize problems known from interviews with workers, and (2) an empirical data analysis where we use topic modeling to mine workers’ complaints from a new English corpus of workers’ forum discussions. While literature covers all process phases, problems in the task evaluation are prevalent, including unfair rejections, late payments, and unjustified blockings of workers. According to the data, however, poor task design in terms of malfunctioning environments, bad workload estimation, and privacy violations seems to bother the workers most. Our findings form the basis for future research on how to improve crowdsourcing processes.

pdf abs
Intrinsic Quality Assessment of Arguments
Henning Wachsmuth | Till Werner
Proceedings of the 28th International Conference on Computational Linguistics

Several quality dimensions of natural language arguments have been investigated. Some are likely to be reflected in linguistic features (e.g., an argument’s arrangement), whereas others depend on context (e.g., relevance) or topic knowledge (e.g., acceptability). In this paper, we study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument’s text. In systematic experiments with eight feature types on an existing corpus, we observe moderate but significant learning success for most dimensions. Rhetorical quality seems hardest to assess, and subjectivity features turn out strong, although length bias in the corpus impedes full validity. We also find that human assessors differ more clearly to each other than to our approach.

pdf abs
SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles
Giovanni Da San Martino | Alberto Barrón-Cedeño | Henning Wachsmuth | Rostislav Petrov | Preslav Nakov
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the results and the main findings of SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. The task featured two subtasks. Subtask SI is about Span Identification: given a plain-text document, spot the specific text fragments containing propaganda. Subtask TC is about Technique Classification: given a specific text fragment, in the context of a full document, determine the propaganda technique it uses, choosing from an inventory of 14 possible propaganda techniques. The task attracted a large number of participants: 250 teams signed up to participate and 44 made a submission on the test set. In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For both subtasks, the best systems used pre-trained Transformers and ensembles.

pdf abs
Detecting Media Bias in News Articles using Gaussian Bias Distributions
Wei-Fan Chen | Khalid Al Khatib | Benno Stein | Henning Wachsmuth
Findings of the Association for Computational Linguistics: EMNLP 2020

Media plays an important role in shaping public opinion. Biased media can influence people in undesirable directions and hence should be unmasked as such. We observe that feature-based and neural text classification approaches which rely only on the distribution of low-level lexical information fail to detect media bias. This weakness becomes most noticeable for articles on new events, where words appear in new contexts and hence their “bias predictiveness” is unclear. In this paper, we therefore study how second-order information about biased statements in an article helps to improve detection effectiveness. In particular, we utilize the probability distributions of the frequency, positions, and sequential order of lexical and informational sentence-level bias in a Gaussian Mixture Model. On an existing media bias dataset, we find that the frequency and positions of biased statements strongly impact article-level bias, whereas their exact sequential order is secondary. Using a standard model for sentence-level bias detection, we provide empirical evidence that article-level bias detectors that use second-order information clearly outperform those without.

pdf abs
Persuasiveness of News Editorials depending on Ideology and Personality
Roxanne El Baff | Khalid Al Khatib | Benno Stein | Henning Wachsmuth
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

News editorials aim to shape the opinions of their readership and the general public on timely controversial issues. The impact of an editorial on the reader’s opinion does not only depend on its content and style, but also on the reader’s profile. Previous work has studied the effect of editorial style depending on general political ideologies (liberals vs.conservatives). In our work, we dig deeper into the persuasiveness of both content and style, exploring the role of the intensity of an ideology (lean vs.extreme) and the reader’s personality traits (agreeableness, conscientiousness, extraversion, neuroticism, and openness). Concretely, we train content- and style-based models on New York Times editorials for different ideology- and personality-specific groups. Our results suggest that particularly readers with extreme ideology and non “role model” personalities are impacted by style. We further analyze the importance of various text features with respect to the editorials’ impact, the readers’ profile, and the editorials’ geographical scope.

2019

pdf abs
Modeling Frames in Argumentation
Yamen Ajjour | Milad Alshomary | Henning Wachsmuth | Benno Stein
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In argumentation, framing is used to emphasize a specific aspect of a controversial topic while concealing others. When talking about legalizing drugs, for instance, its economical aspect may be emphasized. In general, we call a set of arguments that focus on the same aspect a frame. An argumentative text has to serve the “right” frame(s) to convince the audience to adopt the author’s stance (e.g., being pro or con legalizing drugs). More specifically, an author has to choose frames that fit the audience’s cultural background and interests. This paper introduces frame identification, which is the task of splitting a set of arguments into non-overlapping frames. We present a fully unsupervised approach to this task, which first removes topical information and then identifies frames using clustering. For evaluation purposes, we provide a corpus with 12, 326 debate-portal arguments, organized along the frames of the debates’ topics. On this corpus, our approach outperforms different strong baselines, achieving an F1-score of 0.28.

pdf abs
Unraveling the Search Space of Abusive Language in Wikipedia with Dynamic Lexicon Acquisition
Wei-Fan Chen | Khalid Al Khatib | Matthias Hagen | Henning Wachsmuth | Benno Stein
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Many discussions on online platforms suffer from users offending others by using abusive terminology, threatening each other, or being sarcastic. Since an automatic detection of abusive language can support human moderators of online discussion platforms, detecting abusiveness has recently received increased attention. However, the existing approaches simply train one classifier for the whole variety of abusiveness. In contrast, our approach is to distinguish explicitly abusive cases from the more “shadowed” ones. By dynamically extending a lexicon of abusive terms (e.g., including new obfuscations of abusive terms), our approach can support a moderator with explicit unraveled explanations for why something was flagged as abusive: due to known explicitly abusive terms, due to newly detected (obfuscated) terms, or due to shadowed cases.

pdf
Proceedings of the 6th Workshop on Argument Mining
Benno Stein | Henning Wachsmuth
Proceedings of the 6th Workshop on Argument Mining

pdf abs
Computational Argumentation Synthesis as a Language Modeling Task
Roxanne El Baff | Henning Wachsmuth | Khalid Al Khatib | Manfred Stede | Benno Stein
Proceedings of the 12th International Conference on Natural Language Generation

Synthesis approaches in computational argumentation so far are restricted to generating claim-like argument units or short summaries of debates. Ultimately, however, we expect computers to generate whole new arguments for a given stance towards some topic, backing up claims following argumentative and rhetorical considerations. In this paper, we approach such an argumentation synthesis as a language modeling task. In our language model, argumentative discourse units are the “words”, and arguments represent the “sentences”. Given a pool of units for any unseen topic-stance pair, the model selects a set of unit types according to a basic rhetorical strategy (logos vs. pathos), arranges the structure of the types based on the units’ argumentative roles, and finally “phrases” an argument by instantiating the structure with semantically coherent units from the pool. Our evaluation suggests that the model can, to some extent, mimic the human synthesis of strategy-specific arguments.

2018

pdf abs
Challenge or Empower: Revisiting Argumentation Quality in a News Editorial Corpus
Roxanne El Baff | Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of the 22nd Conference on Computational Natural Language Learning

News editorials are said to shape public opinion, which makes them a powerful tool and an important source of political argumentation. However, rarely do editorials change anyone’s stance on an issue completely, nor do they tend to argue explicitly (but rather follow a subtle rhetorical strategy). So, what does argumentation quality mean for editorials then? We develop the notion that an effective editorial challenges readers with opposing stance, and at the same time empowers the arguing skills of readers that share the editorial’s stance — or even challenges both sides. To study argumentation quality based on this notion, we introduce a new corpus with 1000 editorials from the New York Times, annotated for their perceived effect along with the annotators’ political orientations. Analyzing the corpus, we find that annotators with different orientation disagree on the effect significantly. While only 1% of all editorials changed anyone’s stance, more than 5% meet our notion. We conclude that our corpus serves as a suitable resource for studying the argumentation quality of news editorials.

pdf abs
Before Name-Calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Arguing without committing a fallacy is one of the main requirements of an ideal debate. But even when debating rules are strictly enforced and fallacious arguments punished, arguers often lapse into attacking the opponent by an ad hominem argument. As existing research lacks solid empirical investigation of the typology of ad hominem arguments as well as their potential causes, this paper fills this gap by (1) performing several large-scale annotation studies, (2) experimenting with various neural architectures and validating our working hypotheses, such as controversy or reasonableness, and (3) providing linguistic insights into triggers of ad hominem using explainable neural network architectures.

pdf abs
The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice.

In times of fake news and alternative facts, pro and con arguments on controversial topics are of increasing importance. Recently, we presented args.me as the first search engine for arguments on the web. In its initial version, args.me ranked arguments solely by their relevance to a topic queried for, making it hard to learn about the diverse topical aspects covered by the search results. To tackle this shortcoming, we integrated a visualization interface for result exploration in args.me that provides an instant overview of the main aspects in a barycentric coordinate system. This topic space is generated ad-hoc from controversial issues on Wikipedia and argument-specific LDA models. In two case studies, we demonstrate how individual arguments can be found easily through interactions with the visualization, such as highlighting and filtering.

Persuasion is rarely achieved through a loose set of arguments alone. Rather, an effective delivery of arguments follows a rhetorical strategy, combining logical reasoning with appeals to ethics and emotion. We argue that such a strategy means to select, arrange, and phrase a set of argumentative discourse units. In this paper, we model rhetorical strategies for the computational synthesis of effective argumentation. In a study, we let 26 experts synthesize argumentative texts with different strategies for 10 topics. We find that the experts agree in the selection significantly more when following the same strategy. While the texts notably vary for different strategies, especially their arrangement remains stable. The results suggest that our model enables a strategical synthesis.

pdf abs
SemEval-2018 Task 12: The Argument Reasoning Comprehension Task
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of the 12th International Workshop on Semantic Evaluation

A natural language argument is composed of a claim as well as reasons given as premises for the claim. The warrant explaining the reasoning is usually left implicit, as it is clear from the context and common sense. This makes a comprehension of arguments easy for humans but hard for machines. This paper summarizes the first shared task on argument reasoning comprehension. Given a premise and a claim along with some topic information, the goal was to automatically identify the correct warrant among two candidates that are plausible and lexically close, but in fact imply opposite claims. We describe the dataset with 1970 instances that we built for the task, and we outline the 21 computational approaches that participated, most of which used neural networks. The results reveal the complexity of the task, with many approaches hardly improving over the random accuracy of about 0.5. Still, the best observed accuracy (0.712) underlines the principle feasibility of identifying warrants. Our analysis indicates that an inclusion of external knowledge is key to reasoning comprehension.

pdf abs
Learning to Flip the Bias of News Headlines
Wei-Fan Chen | Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of the 11th International Conference on Natural Language Generation

This paper introduces the task of “flipping” the bias of news articles: Given an article with a political bias (left or right), generate an article with the same topic but opposite bias. To study this task, we create a corpus with bias-labeled articles from all-sides.com. As a first step, we analyze the corpus and discuss intrinsic characteristics of bias. They point to the main challenges of bias flipping, which in turn lead to a specific setting in the generation process. The paper in hand narrows down the general bias flipping task to focus on bias flipping for news article headlines. A manual annotation of headlines from each side reveals that they are self-informative in general and often convey bias. We apply an autoencoder incorporating information from an article’s content to learn how to automatically flip the bias. From 200 generated headlines, 73 are classified as understandable by annotators, and 83 maintain the topic while having opposite bias. Insights from our analysis shed light on how to solve the main challenges of bias flipping.

pdf abs
Retrieval of the Best Counterargument without Prior Topic Knowledge
Henning Wachsmuth | Shahbaz Syed | Benno Stein
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Given any argument on any controversial topic, how to counter it? This question implies the challenging retrieval task of finding the best counterargument. Since prior knowledge of a topic cannot be expected in general, we hypothesize the best counterargument to invoke the same aspects as the argument while having the opposite stance. To operationalize our hypothesis, we simultaneously model the similarity and dissimilarity of pairs of arguments, based on the words and embeddings of the arguments’ premises and conclusions. A salient property of our model is its independence from the topic at hand, i.e., it applies to arbitrary arguments. We evaluate different model variations on millions of argument pairs derived from the web portal idebate.org. Systematic ranking experiments suggest that our hypothesis is true for many arguments: For 7.6 candidates with opposing stance on average, we rank the best counterargument highest with 60% accuracy. Even among all 2801 test set pairs as candidates, we still find the best one about every third time.

pdf abs
Modeling Deliberative Argumentation Strategies on Wikipedia
Khalid Al-Khatib | Henning Wachsmuth | Kevin Lang | Jakob Herpel | Matthias Hagen | Benno Stein
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper studies how the argumentation strategies of participants in deliberative discussions can be supported computationally. Our ultimate goal is to predict the best next deliberative move of each participant. In this paper, we present a model for deliberative discussions and we illustrate its operationalization. Previous models have been built manually based on a small set of discussions, resulting in a level of abstraction that is not suitable for move recommendation. In contrast, we derive our model statistically from several types of metadata that can be used for move description. Applied to six million discussions from Wikipedia talk pages, our approach results in a model with 13 categories along three dimensions: discourse acts, argumentative relations, and frames. On this basis, we automatically generate a corpus with about 200,000 turns, labeled for the 13 categories. We then operationalize the model with three supervised classifiers and provide evidence that the proposed categories can be predicted.

2017

Argumentation quality is viewed differently in argumentation theory and in practical assessment approaches. This paper studies to what extent the views match empirically. We find that most observations on quality phrased spontaneously are in fact adequately represented by theory. Even more, relative comparisons of arguments in practice correlate with absolute quality ratings based on theory. Our results clarify how the two views can learn from each other.

pdf abs
Patterns of Argumentation Strategies across Topics
Khalid Al-Khatib | Henning Wachsmuth | Matthias Hagen | Benno Stein
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper presents an analysis of argumentation strategies in news editorials within and across topics. Given nearly 29,000 argumentative editorials from the New York Times, we develop two machine learning models, one for determining an editorial’s topic, and one for identifying evidence types in the editorial. Based on the distribution and structure of the identified types, we analyze the usage patterns of argumentation strategies among 12 different topics. We detect several common patterns that provide insights into the manifestation of argumentation strategies. Also, our experiments reveal clear correlations between the topics and the detected patterns.

pdf abs
The Impact of Modeling Overall Argumentation with Tree Kernels
Henning Wachsmuth | Giovanni Da San Martino | Dora Kiesel | Benno Stein
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Several approaches have been proposed to model either the explicit sequential structure of an argumentative text or its implicit hierarchical structure. So far, the adequacy of these models of overall argumentation remains unclear. This paper asks what type of structure is actually important to tackle downstream tasks in computational argumentation. We analyze patterns in the overall argumentation of texts from three corpora. Then, we adapt the idea of positional tree kernels in order to capture sequential and hierarchical argumentative structure together for the first time. In systematic experiments for three text classification tasks, we find strong evidence for the impact of both types of structure. Our results suggest that either of them is necessary while their combination may be beneficial.

Research on computational argumentation faces the problem of how to automatically assess the quality of an argument or argumentation. While different quality dimensions have been approached in natural language processing, a common understanding of argumentation quality is still missing. This paper presents the first holistic work on computational argumentation quality in natural language. We comprehensively survey the diverse existing theories and approaches to assess logical, rhetorical, and dialectical quality dimensions, and we derive a systematic taxonomy from these. In addition, we provide a corpus with 320 arguments, annotated for all 15 dimensions in the taxonomy. Our results establish a common ground for research on computational argumentation quality assessment.

pdf abs
“PageRank” for Argument Relevance
Henning Wachsmuth | Benno Stein | Yamen Ajjour
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Future search engines are expected to deliver pro and con arguments in response to queries on controversial topics. While argument mining is now in the focus of research, the question of how to retrieve the relevant arguments remains open. This paper proposes a radical model to assess relevance objectively at web scale: the relevance of an argument’s conclusion is decided by what other arguments reuse it as a premise. We build an argument graph for this model that we analyze with a recursive weighting scheme, adapting key ideas of PageRank. In experiments on a large ground-truth argument graph, the resulting relevance scores correlate with human average judgments. We outline what natural language challenges must be faced at web scale in order to stepwise bring argument relevance to web search engines.

pdf abs
WAT-SL: A Customizable Web Annotation Tool for Segment Labeling
Johannes Kiesel | Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

A frequent type of annotations in text corpora are labeled text segments. General-purpose annotation tools tend to be overly comprehensive, often making the annotation process slower and more error-prone. We present WAT-SL, a new web-based tool that is dedicated to segment labeling and highly customizable to the labeling task at hand. We outline its main features and exemplify how we used it for a crowdsourced corpus with labeled argument units.

Computational argumentation is expected to play a critical role in the future of web search. To make this happen, many search-related questions must be revisited, such as how people query for arguments, how to mine arguments from the web, or how to rank them. In this paper, we develop an argument search framework for studying these and further questions. The framework allows for the composition of approaches to acquiring, mining, assessing, indexing, querying, retrieving, ranking, and presenting arguments while relying on standard infrastructure and interfaces. Based on the framework, we build a prototype search engine, called args, that relies on an initial, freely accessible index of nearly 300k arguments crawled from reliable web resources. The framework and the argument search engine are intended as an environment for collaborative research on computational argumentation and its practical evaluation.

pdf abs
Unit Segmentation of Argumentative Texts
Yamen Ajjour | Wei-Fan Chen | Johannes Kiesel | Henning Wachsmuth | Benno Stein
Proceedings of the 4th Workshop on Argument Mining

The segmentation of an argumentative text into argument units and their non-argumentative counterparts is the first step in identifying the argumentative structure of the text. Despite its importance for argument mining, unit segmentation has been approached only sporadically so far. This paper studies the major parameters of unit segmentation systematically. We explore the effectiveness of various features, when capturing words separately, along with their neighbors, or even along with the entire text. Each such context is reflected by one machine learning model that we evaluate within and across three domains of texts. Among the models, our new deep learning approach capturing the entire text turns out best within all domains, with an F-score of up to 88.54. While structural features generalize best across domains, the domain transfer remains hard, which points to major challenges of unit segmentation.

2016

pdf abs
Using Argument Mining to Assess the Argumentation Quality of Essays
Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Argument mining aims to determine the argumentative structure of texts. Although it is said to be crucial for future applications such as writing support systems, the benefit of its output has rarely been evaluated. This paper puts the analysis of the output into the focus. In particular, we investigate to what extent the mined structure can be leveraged to assess the argumentation quality of persuasive essays. We find insightful statistical patterns in the structure of essays. From these, we derive novel features that we evaluate in four argumentation-related essay scoring tasks. Our results reveal the benefit of argument mining for assessing argumentation quality. Among others, we improve the state of the art in scoring an essay’s organization and its argument strength.

pdf abs
A News Editorial Corpus for Mining Argumentation Strategies
Khalid Al-Khatib | Henning Wachsmuth | Johannes Kiesel | Matthias Hagen | Benno Stein
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Many argumentative texts, and news editorials in particular, follow a specific strategy to persuade their readers of some opinion or attitude. This includes decisions such as when to tell an anecdote or where to support an assumption with statistics, which is reflected by the composition of different types of argumentative discourse units in a text. While several argument mining corpora have recently been published, they do not allow the study of argumentation strategies due to incomplete or coarse-grained unit annotations. This paper presents a novel corpus with 300 editorials from three diverse news portals that provides the basis for mining argumentation strategies. Each unit in all editorials has been assigned one of six types by three annotators with a high Fleiss’ Kappa agreement of 0.56. We investigate various challenges of the annotation process and we conduct a first corpus analysis. Our results reveal different strategies across the news portals, exemplifying the benefit of studying editorials—a so far underresourced text genre in argument mining.

pdf
Cross-Domain Mining of Argumentative Text through Distant Supervision
Khalid Al-Khatib | Henning Wachsmuth | Matthias Hagen | Jonas Köhler | Benno Stein
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies