Ye Chen

2026

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies
Xudong Shen | li Yuan | Ye Chen | Xin Wu | Yi Cai | Zhiyong Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While Large Language Models (LLMs) exhibit strong semantic capabilities, their resilience to manipulative linguistic patterns such as logical fallacies remains an underexplored area. Prior work has focused on the ability of LLMs to **identify** or **classify** fallacies, but their robustness against these fallacies in persuasive contexts remains largely unexplored.To address this gap, we introduce **LoFa** (Logical Fallacy), a comprehensive benchmark to evaluate LLM robustness against fallacies. We first construct the **LoFa** dataset via a multi-agent pipeline, pairing factual questions with fallacious arguments. Then, we develop a multi-round debate framework to assess model resilience under sustained attacks.Furthermore, to disentangle robustness from a model’s inherent knowledge limitations, we propose a new metric, LFR@k (Logical Fallacy Resistance), to quantify performance. Our experiments reveal that different LLMs exhibit varied robustness to distinct types of fallacies, highlighting unique vulnerability profiles across models.

pdf bib abs

Teaching LLM to be Persuasive: Reward-Enhanced Policy Optimization for Alignment from Heterogeneous Rewards
Xia Zeng | Yihan Chen | Luhui Liu | Chao Luo | Ye Chen | Zhuangzhuoran
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

We deploy large language models (LLMs) as business development (BD) agents for persuasive price negotiation in online travel agencies (OTAs). The agent must follow a multi-stage Standard Operating Procedure (SOP) and strict guardrails (no over-promising and no hallucinations), while remaining human-like and effective over long, multi-turn dialogues.We propose Reward-Enhanced Policy Optimization (REPO), a reinforcement learning post-training method that combines heterogeneous rewards: a preference-trained reward model (RM), an LLM-as-a-judge (RJ) for nuanced behaviors (e.g., emotional value and SOP compliance), and rule-based reward functions (RF) (mainly regex-based) for deterministic checks on numerics, formatting, and guardrails. In expert consensus evaluation (three human experts; 30 online conversations and 45 curated bad cases), REPO improves average dialogue rating to 4.63 (+0.33 over GRPO) and raises the share of conversations with at least one excellent response to 66.67% (+23.34 pp over GRPO), while achieving a 93.33% bad-case fix rate with 75.56% clean fixes.In a production A/B test on 9,653 real customer conversations (vs. an intent-driven dialogue system), REPO improves response rate by +12.14 pp and task success rate by +5.94 pp (p<0.001).

Co-authors

Xin Wu 1

Zhiyong Wu 1

Li Yuan 1

Xia Zeng 1

Zhuangzhuoran 1

Venues

ACL2

Fix author