Jonathan Clayton


2026

We create, and make publicly available, a novel dataset for the task of Argument Summary Graph Parsing (ASGP), which we call SENSEI-ASG, based on annotating a subset of the SENSEI corpus. Given an argumentative dialogue, such as might be found in a social media exchange, ASGP is the task of creating an Argument Summary Graph, a data structure which consists of nodes containing summaries of arguments in a dialogue, and edges showing argumentative relations between them. We find that the only existing ASG dataset, Debatabase-ASG, is not representative of online debates in language use, length of the dialogues, or graph complexity. In contrast to Debatabase-ASG, which was created based on a curated debate collection, SENSEI-ASG contains examples of spontaneous debates arising in the comments sections of an online newspaper (namely, The Guardian). We achieve moderate inter-annotator agreement on the dataset, with a Cohen’s kappa of k=0.57, reflecting the inherent challenges in distinguishing argumentative from non-argumentative text. We propose baselines for the new dataset by fine-tuning Llama-3 for the ASGP task, using the two ASGP datasets and an additional out-of-domain argument mining dataset, the AAEC.

2022

This paper proposes a novel task in Argument Mining, which we will refer to as Reasoning Marker Prediction. We reuse the popular Persuasive Essays Corpus (Stab and Gurevych, 2014). Instead of using this corpus for Argument Structure Parsing, we use a simple heuristic method to identify text spans which we can identify as reasoning markers. We propose baseline methods for predicting the presence of these reasoning markers automatically, and make a script to generate the data for the task publicly available.