Ye Chen

2026

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies
Xudong Shen | li Yuan | Ye Chen | Xin Wu | Yi Cai | Zhiyong Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While Large Language Models (LLMs) exhibit strong semantic capabilities, their resilience to manipulative linguistic patterns such as logical fallacies remains an underexplored area. Prior work has focused on the ability of LLMs to **identify** or **classify** fallacies, but their robustness against these fallacies in persuasive contexts remains largely unexplored.To address this gap, we introduce **LoFa** (Logical Fallacy), a comprehensive benchmark to evaluate LLM robustness against fallacies. We first construct the **LoFa** dataset via a multi-agent pipeline, pairing factual questions with fallacious arguments. Then, we develop a multi-round debate framework to assess model resilience under sustained attacks.Furthermore, to disentangle robustness from a model’s inherent knowledge limitations, we propose a new metric, LFR@k (Logical Fallacy Resistance), to quantify performance. Our experiments reveal that different LLMs exhibit varied robustness to distinct types of fallacies, highlighting unique vulnerability profiles across models.

Co-authors

Venues

ACL1

Fix author