Counter-arguments (CAs) are a good means to improve the critical-thinking skills of learners, especially given that one has to thoroughly consider the logic of initial arguments (IA) when composing their CA. Although several tasks have been created for identifying the logical structure of CAs, no prior work has focused on capturing multiple interpretations of logical structures due to their complexity. In this work, we create CALSA+, a dataset consisting of 134 CAs annotated with 13 logical predicate questions. CALSA+ contains 1,742 instances annotated by 3 expert annotators (5,226 total annotations) with good agreement (Krippendorff 𝛼=0.46). Using CALSA+, we train a model with Reinforcement Learning with Verifiable Rewards (RLVR) to identify multiple logical interpretations and show that models trained with RLVR can perform on par with much bigger proprietary models. Our work is the first to attempt to annotate all the interpretations of logical structure on top of CAs. We publicly release our dataset to facilitate research in CA logical structure identification.
Prior research in computational argumentation has mainly focused on scoring the quality of arguments, with less attention on explicating logical errors. In this work, we introduce four sets of explainable templates for common informal logical fallacies designed to explicate a fallacy’s implicit logic. Using our templates, we conduct an annotation study on top of 400 fallacious arguments taken from LOGIC dataset and achieve a high agreement score (Krippendorf’s 𝛼 of 0.54) and reasonable coverage 83%. Finally, we conduct an experiment for detecting the structure of fallacies and discover that state-of-the-art language models struggle with detecting fallacy templates (0.47 accuracy). To facilitate research on fallacies, we make our dataset and guidelines publicly available.
The use of argumentation in education has shown improvement in students’ critical thinking skills, and computational models for argumentation have been developed to further assist this process. Although these models are useful for evaluating the quality of an argument, they often cannot explain why a particular argument score was predicted, i.e., why the argument is good or bad, which makes it difficult to provide constructive feedback to users, e.g., students, so that they can strengthen their critical thinking skills. In this survey, we explore current NLP feedback systems by categorizing each into four important dimensions of feedback (Richness, Visualization, Interactivity and Personalization). We discuss limitations for each dimension and provide suggestions to enhance the power of feedback and explanations to ultimately improve user critical thinking skills.