Junchi Yao


2025

pdf bib
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Shu Yang | Shenzhe Zhu | Zeyu Wu | Keyu Wang | Junchi Yao | Junchao Wu | Lijie Hu | Mengdi Li | Derek F. Wong | Di Wang
Findings of the Association for Computational Linguistics: ACL 2025

With the increasing integration of large language models (LLMs) into real-world applications such as finance, e-commerce, and recommendation systems, their susceptibility to misinformation and adversarial manipulation poses significant risks. Existing fraud detection benchmarks primarily focus on single-turn classification tasks, failing to capture the dynamic nature of real-world fraud attempts. To address this gap, we introduce Fraud-R1, a challenging bilingual benchmark designed to assess LLMs’ ability to resist fraud and phishing attacks across five key fraud categories: Fraudulent Services, Impersonation, Phishing Scams, Fake Job Postings, and Online Relationships, covering subclasses. Our dataset comprises manually curated fraud cases from social media, news, phishing scam records, and prior fraud datasets.

pdf bib
Understanding the Repeat Curse in Large Language Models from a Feature Perspective
Junchi Yao | Shu Yang | Jianhua Xu | Lijie Hu | Mengdi Li | Di Wang
Findings of the Association for Computational Linguistics: ACL 2025

Large language models (LLMs) have made remarkable progress in various domains, yet they often suffer from repetitive text generation, a phenomenon we refer to as the ”Repeat Curse”. While previous studies have proposed decoding strategies to mitigate repetition, the underlying mechanism behind this issue remains insufficiently explored. In this work, we investigate the root causes of repetition in LLMs through the lens of mechanistic interpretability. Inspired by recent advances in Sparse Autoencoders (SAEs), which enable monosemantic feature extraction, we propose a novel approach—”Duplicatus Charm”—to induce and analyze the Repeat Curse. Our method systematically identifies “Repetition Features” -the key model activations responsible for generating repetitive outputs. First, we locate the layers most involved in repetition through logit analysis. Next, we extract and stimulate relevant features using SAE-based activation manipulation. To validate our approach, we construct a repetition dataset covering token and paragraph level repetitions and introduce an evaluation pipeline to quantify the influence of identified repetition features. Furthermore, by deactivating these features, we have effectively mitigated the Repeat Curse.