Gabriel Freedman


2025

pdf bib
Evaluating Uncertainty Quantification Methods in Argumentative Large Language Models
Kevin Zhou | Adam Dejl | Gabriel Freedman | Lihu Chen | Antonio Rago | Francesca Toni
Findings of the Association for Computational Linguistics: EMNLP 2025

Research in uncertainty quantification (UQ) for large language models (LLMs) is increasingly important towards guaranteeing the reliability of this groundbreaking technology. We explore the integration of LLM UQ methods in argumentative LLMs (ArgLLMs), an explainable LLM framework for decision-making based on computational argumentation in which UQ plays a critical role. We conduct experiments to evaluate ArgLLMs’ performance on claim verification tasks when using different LLM UQ methods, inherently performing an assessment of the UQ methods’ effectiveness. Moreover, the experimental procedure itself is a novel way of evaluating the effectiveness of UQ methods, especially when intricate and potentially contentious statements are present. Our results demonstrate that, despite its simplicity, direct prompting is an effective UQ strategy in ArgLLMs, outperforming considerably more complex approaches.

2024

pdf bib
Detecting Scientific Fraud Using Argument Mining
Gabriel Freedman | Francesca Toni
Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024)

A proliferation of fraudulent scientific research in recent years has precipitated a greater interest in more effective methods of detection. There are many varieties of academic fraud, but a particularly challenging type to detect is the use of paper mills and the faking of peer-review. To the best of our knowledge, there have so far been no attempts to automate this process.The complexity of this issue precludes the use of heuristic methods, like pattern-matching techniques, which are employed for other types of fraud. Our proposed method in this paper uses techniques from the Computational Argumentation literature (i.e. argument mining and argument quality evaluation). Our central hypothesis stems from the assumption that articles that have not been subject to the proper level of scrutiny will contain poorly formed and reasoned arguments, relative to legitimately published papers. We use a variety of corpora to test this approach, including a collection of abstracts taken from retracted papers. We show significant improvement compared to a number of baselines, suggesting that this approach merits further investigation.