Srdjan Vesic

2026

Towards Complex Debate Understanding: Predicting Claim Impact Scores through the Modelling of Claim Interactions
Maxime Brouat | Mihai Surdeanu | Srdjan Vesic | Eduardo Blanco
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Structured debates can be naturally modeled as argument graphs, with claims connected by support and attack relations, a representation formalised in Computational Argumentation Theory. In this paper, we propose a novel neural architecture that jointly models both the textual content of claims and their relational structure. Claims are encoded using contextualised embeddings and compressed through a feedforward compression layer. Then, a graph attention network explicitly captures attack/support interactions. Trained on real-world debates from the Kialo platform, our model predicts the distribution of user-assigned impact votes for each claim. It achieves a mean absolute error (MAE) of 0.068, significantly outperforming both text-only and structure-only baselines. Further experiments show strong out-of-domain generalisation across thematic clusters, as well as suggestive correlations between the model’s attention patterns and human voting behaviour. An analysis of linguistic and graph-based features suggests that the model relies on latent argumentative patterns as well as the text. Our findings also shed light on language differences between strong and weak claims, as determined by humans as well as by our best model.

2025

pdf bib abs

Can LLMs Judge Debates? Evaluating Non-Linear Reasoning via Argumentation Theory Semantics
Reza Sanayei | Srdjan Vesic | Eduardo Blanco | Mihai Surdeanu
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Language Models (LLMs) excel at linear reasoning tasks but remain underexplored on non-linear structures such as those found in natural debates, which are best expressed as argument graphs. We evaluate whether LLMs can approximate structured reasoning from Computational Argumentation Theory (CAT). Specifically, we use Quantitative Argumentation Debate (QuAD) semantics, which assigns acceptability scores to arguments based on their attack and support relations. Given only dialogue-formatted debates from two NoDE datasets, models are prompted to rank arguments without access to the underlying graph. We test several LLMs under advanced instruction strategies, including Chain-of-Thought and In-Context Learning. While models show moderate alignment with QuAD rankings, performance degrades with longer inputs or disrupted discourse flow. Advanced prompting helps mitigate these effects by reducing biases related to argument length and position. Our findings highlight both the promise and limitations of LLMs in modeling formal argumentation semantics and motivate future work on graph-aware reasoning.

Co-authors

Venues

Findings1
LREC1

Fix author