Yue Zhao
Other people with similar names: Yue Zhao, Yue Zhao, Yue Zhao
Unverified author pages with similar names: Yue Zhao
2026
Defenses Against Prompt Attacks Learn Surface Heuristics
Li Li | Chenxiao Yu | Zhiyu Ni | Hao Li | Charith Peris | Chaowei Xiao | Yue Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Li Li | Chenxiao Yu | Zhiyu Ni | Hao Li | Charith Peris | Chaowei Xiao | Yue Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) are increasingly deployed in security-sensitive applications, where they must follow system- or developer-specified instructions that define the intended task behavior, while completing benign user requests. When adversarial instructions appear in user queries or externally retrieved content, models may override intended logic. Recent defenses rely on supervised fine-tuning with benign and malicious labels. Although these methods achieve high attack rejection rates, we find that they rely on narrow correlations in defense data rather than harmful intent, leading to systematic rejection of safe inputs. We analyze three recurring shortcut behaviors induced by defense fine-tuning. Position bias arises when benign content placed later in a prompt is rejected at much higher rates; across reasoning benchmarks, suffix-task rejection rises from below 10% to as high as 90%. Token trigger bias occurs when strings common in attack data raise rejection probability even in benign contexts; inserting a single trigger token increases false refusals by up to 50%. Topic generalization bias reflects poor generalization beyond the defense data distribution, with defended models suffering test-time accuracy drops of up to 40%. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent. We introduce controlled diagnostic datasets and a systematic evaluation across two base models and multiple defense pipelines, highlighting limitations of supervised fine-tuning for reliable LLM security.
CoAct: Co-Active LLM Preference Learning with Human-AI Synergy
Ruiyao Xu | Mihir Parmar | Tiankai Yang | Zhengyu Hu | Yue Zhao | Kaize Ding
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ruiyao Xu | Mihir Parmar | Tiankai Yang | Zhengyu Hu | Yue Zhao | Kaize Ding
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Learning from preference-based feedback has become an effective approach for aligning LLMs across diverse tasks. However, high-quality human-annotated preference data remains expensive and scarce. Existing methods address this challenge through either self-rewarding, which scales by using purely AI-generated labels but risks unreliability, or active learning, which ensures quality through oracle annotation but cannot fully leverage unlabeled data. In this paper, we present CoAct, a novel framework that synergistically combines self-rewarding and active learning through strategic human-AI collaboration. CoAct leverages self-consistency to identify both reliable self-labeled data and samples requiring oracle verification. Additionally, oracle feedback guides the model to generate new instructions within its solvable capability. Evaluated on three reasoning benchmarks across two model families, CoAct achieves average improvements of +13.25% on GSM8K, +8.19% on MATH, and +13.16% on WebInstruct, consistently outperforming all baselines.
Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
Jinbo Liu | Defu Cao | Yifei Wei | Tianyao Su | Yuan Liang | Yushun Dong | Yan Liu | Yue Zhao | Xiyang Hu
Findings of the Association for Computational Linguistics: ACL 2026
Jinbo Liu | Defu Cao | Yifei Wei | Tianyao Su | Yuan Liang | Yushun Dong | Yan Liu | Yue Zhao | Xiyang Hu
Findings of the Association for Computational Linguistics: ACL 2026
Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a controlled evaluation framework for comparing topology-conditioned memory leakage in multi-agent LLM systems. MAMA operates on synthetic documents containing labeled Personally Identifiable Information (PII) entities, from which we generate sanitized task instructions. We execute a two-phase protocol: Engram (seeding private information into a target agent’s memory) and Resonance (multi-round interaction where an attacker attempts extraction). Over 10 rounds, we measure leakage using a two-stage recovery criterion that combines exact-match extraction with LLM-based inference over the attacker’s final output. We evaluate six canonical topologies (complete, circle, chain, tree, star, star-ring) across n∈{4,5,6}, attacker–target placements, and base models. Results are consistent: denser connectivity, shorter attacker–target distance, and higher target centrality increase leakage; most leakage occurs in early rounds and then plateaus; model choice shifts absolute rates but preserves broad structural trends; spatiotemporal/location attributes leak more readily than identity credentials or regulated identifiers. We distill practical guidance for system design: favor sparse or hierarchical connectivity, maximize attacker–target separation, and restrict hub/shortcut pathways via topology-aware access control. Our code is available at https://github.com/llll121/mama-eval.
A Survey on LLM-based Conversational User Simulation
Bo Ni | Yu Wang | Leyao Wang | Branislav Kveton | Franck Dernoncourt | Yu Xia | Hongjie Chen | Reuben Luera | Samyadeep Basu | Subhojyoti Mukherjee | Puneet Mathur | Nesreen K. Ahmed | Junda Wu | Li Li | Huixin Zhang | Ruiyi Zhang | Tong Yu | Sungchul Kim | Jiuxiang Gu | Zhengzhong Tu | Alexa Siu | Zichao Wang | Seunghyun Yoon | Nedim Lipka | Namyong Park | Zihao Lin | Trung Bui | Yue Zhao | Tyler Derr | Ryan A. Rossi
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Bo Ni | Yu Wang | Leyao Wang | Branislav Kveton | Franck Dernoncourt | Yu Xia | Hongjie Chen | Reuben Luera | Samyadeep Basu | Subhojyoti Mukherjee | Puneet Mathur | Nesreen K. Ahmed | Junda Wu | Li Li | Huixin Zhang | Ruiyi Zhang | Tong Yu | Sungchul Kim | Jiuxiang Gu | Zhengzhong Tu | Alexa Siu | Zichao Wang | Seunghyun Yoon | Nedim Lipka | Namyong Park | Zihao Lin | Trung Bui | Yue Zhao | Tyler Derr | Ryan A. Rossi
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
User simulation has long played a vital role in computer science due to its potential to support a wide range of applications. Language, as the primary medium of human communication, forms the foundation of social interaction and behavior. Consequently, simulating conversational behavior has become a key area of study. Recent advancements in large language models (LLMs) have significantly catalyzed progress in this domain by enabling high-fidelity generation of synthetic user conversation. In this paper, we survey recent advancements in LLM-based conversational user simulation. We introduce a novel taxonomy covering user granularity and simulation objectives. Additionally, we systematically analyze core techniques and evaluation methodologies. We aim to keep the research community informed of the latest advancements in conversational user simulation and to further facilitate future research by identifying open challenges and organizing existing work under a unified framework.
2025
From Selection to Generation: A Survey of LLM-based Active Learning
Yu Xia | Subhojyoti Mukherjee | Zhouhang Xie | Junda Wu | Xintong Li | Ryan Aponte | Hanjia Lyu | Joe Barrow | Hongjie Chen | Franck Dernoncourt | Branislav Kveton | Tong Yu | Ruiyi Zhang | Jiuxiang Gu | Nesreen K. Ahmed | Yu Wang | Xiang Chen | Hanieh Deilamsalehy | Sungchul Kim | Zhengmian Hu | Yue Zhao | Nedim Lipka | Seunghyun Yoon | Ting-Hao Kenneth Huang | Zichao Wang | Puneet Mathur | Soumyabrata Pal | Koyel Mukherjee | Zhehao Zhang | Namyong Park | Thien Huu Nguyen | Jiebo Luo | Ryan A. Rossi | Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yu Xia | Subhojyoti Mukherjee | Zhouhang Xie | Junda Wu | Xintong Li | Ryan Aponte | Hanjia Lyu | Joe Barrow | Hongjie Chen | Franck Dernoncourt | Branislav Kveton | Tong Yu | Ruiyi Zhang | Jiuxiang Gu | Nesreen K. Ahmed | Yu Wang | Xiang Chen | Hanieh Deilamsalehy | Sungchul Kim | Zhengmian Hu | Yue Zhao | Nedim Lipka | Seunghyun Yoon | Ting-Hao Kenneth Huang | Zichao Wang | Puneet Mathur | Soumyabrata Pal | Koyel Mukherjee | Zhehao Zhang | Namyong Park | Thien Huu Nguyen | Jiebo Luo | Ryan A. Rossi | Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li | Jiashu Qu | Linxin Song | Yuxiao Zhou | Yuehan Qin | Tiankai Yang | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Li Li | Jiashu Qu | Linxin Song | Yuxiao Zhou | Yuehan Qin | Tiankai Yang | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Vision-Language Models (VLMs) excel at tasks such as image captioning and visual question answering but frequently produce hallucinated outputs that deviate from the actual visual input or prompt. While prior work links hallucination to biases in data or representation, their causal origins remain unclear. We propose a causal framework to analyze and mitigate hallucination in VLMs. Our key hypothesis is that hallucinations arise from unintended direct influences of the vision or text modality that bypass the intended multi-modal fusion. To examine this, we construct a causal graph of the VLM and use counterfactual analysis to estimate the Natural Direct Effect (NDE) of each modality and their interaction. By systematically identifying and suppressing these direct effects, we encourage outputs that are more faithfully grounded in true cross-modal reasoning. Our approach consists of three steps: (1) designing structural causal graphs to distinguish correct fusion pathways from spurious modality shortcuts, (2) estimating modality-specific and cross-modal NDE using perturbed image representations, hallucinated text embeddings, and degraded visual inputs, and (3) implementing a test-time intervention module to dynamically adjust the model’s dependence on each modality. Experimental results demonstrate that our method significantly reduces hallucination while preserving task performance, providing a robust and interpretable framework for improving VLM reliability.
TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models
Yanbo Wang | Jiayi Ye | Siyuan Wu | Chujie Gao | Yue Huang | Xiuying Chen | Yue Zhao | Xiangliang Zhang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Yanbo Wang | Jiayi Ye | Siyuan Wu | Chujie Gao | Yue Huang | Xiuying Chen | Yue Zhao | Xiangliang Zhang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Ensuring the trustworthiness of Generative Foundation Models (GenFMs) is a pressing challenge as they gain widespread use. Existing evaluation toolkits are often limited in scope, dynamism, and flexibility. This paper introduces TRUSTEVAL, a dynamic and comprehensive toolkit designed for evaluating GenFMs across various dimensions. TRUSTEVAL supports both dynamic dataset generation and evaluation, offering advanced features including comprehensiveness, usability, and flexibility. TRUSTEVAL integrates diverse generative models, datasets, evaluation methods, metrics, inference efficiency enhancement, and evaluation report generation. Through case studies, we demonstrate TRUSTEVAL’s potential to advance the trustworthiness evaluation of GenFMs.
AD-LLM: Benchmarking Large Language Models for Anomaly Detection
Tiankai Yang | Yi Nian | Li Li | Ruiyao Xu | Yuangang Li | Jiaqi Li | Zhuo Xiao | Xiyang Hu | Ryan A. Rossi | Kaize Ding | Xia Hu | Yue Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Tiankai Yang | Yi Nian | Li Li | Ruiyao Xu | Yuangang Li | Jiaqi Li | Zhuo Xiao | Xiyang Hu | Ryan A. Rossi | Kaize Ding | Xia Hu | Yue Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Anomaly detection (AD) is an important machine learning task with many real-world uses, including fraud detection, medical diagnosis, and industrial monitoring. Within natural language processing (NLP), AD helps detect issues like spam, misinformation, and unusual user activity. Although large language models (LLMs) have had a strong impact on tasks such as text generation and summarization, their potential in AD has not been studied enough. This paper introduces AD-LLM, the first benchmark that evaluates how LLMs can help with NLP anomaly detection. We examine three key tasks: (i) zero-shot detection, using LLMs’ pre-trained knowledge to perform AD without tasks-specific training; (ii) data augmentation, generating synthetic data and category descriptions to improve AD models; and (iii) model selection, using LLMs to suggest unsupervised AD models. Through experiments with different datasets, we find that LLMs can work well in zero-shot AD, that carefully designed augmentation methods are useful, and that explaining model selection for specific datasets remains challenging. Based on these results, we outline six future research directions on LLMs for AD.
LLM-Empowered Patient-Provider Communication: A Data-Centric Survey From a Clinical Perspective
Ruosi Shao | Md Shamim Seraj | Kangyi Zhao | Yingtao Luo | Lincan Li | Bolin Shen | Averi Bates | Yue Zhao | Chongle Pan | Lisa Hightow-Weidman | Shayok Chakraborty | Yushun Dong
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Ruosi Shao | Md Shamim Seraj | Kangyi Zhao | Yingtao Luo | Lincan Li | Bolin Shen | Averi Bates | Yue Zhao | Chongle Pan | Lisa Hightow-Weidman | Shayok Chakraborty | Yushun Dong
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Large language models (LLMs) hold promise for advancing patient–provider communication, yet a persistent gap remains between benchmark-driven model development and the realities of clinical practice. This work presents a systematic, clinically grounded review of text-based medical datasets for LLM training and evaluation. We propose a scenario-based taxonomy derived from established clinical frameworks to map major knowledge-based and conversation-based corpora against core communication scenarios. We further synthesize core communication skills from gold-standard clinical assessment instruments and meta-analyze state-of-the-art medical LLM performance, highlighting how dataset properties, fine-tuning strategies, and evaluation metrics shape both knowledge acquisition and communicative competence. To empirically validate these findings, we conducted controlled fine-tuning experiments across representative LLMs, demonstrating that data composition and scenario alignment critically affect model performance. Our findings highlight the urgent need for scenario-rich datasets and standardized, human-centered evaluation protocol to advance clinically relevant medical LLMs.
NLP-ADBench: NLP Anomaly Detection Benchmark
Yuangang Li | Jiaqi Li | Zhuo Xiao | Tiankai Yang | Yi Nian | Xiyang Hu | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Yuangang Li | Jiaqi Li | Zhuo Xiao | Tiankai Yang | Yi Nian | Xiyang Hu | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Anomaly detection (AD) is an important machine learning task with applications in fraud detection, content moderation, and user behavior analysis. However, AD is relatively understudied in a natural language processing (NLP) context, limiting its effectiveness in detecting harmful content, phishing attempts, and spam reviews. We introduce NLP-ADBench, the most comprehensive NLP anomaly detection (NLP-AD) benchmark to date, which includes eight curated datasets and 19 state-of-the-art algorithms. These span 3 end-to-end methods and 16 two-step approaches that adapt classical, non-AD methods to language embeddings from BERT and OpenAI. Our empirical results show that no single model dominates across all datasets, indicating a need for automated model selection. Moreover, two-step methods with transformer-based embeddings consistently outperform specialized end-to-end approaches, with OpenAI embeddings outperforming those of BERT. We release NLP-ADBench at https://github.com/USC-FORTIS/NLP-ADBench, providing a unified framework for NLP-AD and supporting future investigations.
AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection
Tiankai Yang | Junjun Liu | Michael Siu | Jiahang Wang | Zhuangzhuang Qian | Chanjuan Song | Cheng Cheng | Xiyang Hu | Yue Zhao
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Tiankai Yang | Junjun Liu | Michael Siu | Jiahang Wang | Zhuangzhuang Qian | Chanjuan Song | Cheng Cheng | Xiyang Hu | Yue Zhao
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Anomaly detection (AD) is essential in areas such as fraud detection, network monitoring, and scientific research. However, the diversity of data modalities and the increasing number of specialized AD libraries pose challenges for non-expert users who lack in-depth library-specific knowledge and advanced programming skills. To tackle this, we present AD-AGENT, an LLM-driven multi-agent framework that turns natural-language instructions into fully executable AD pipelines. AD-AGENT coordinates specialized agents for intent parsing, data preparation, library and model selection, documentation mining, and iterative code generation and debugging. Using a shared short-term workspace and a long-term cache, the agents integrate popular AD libraries like PyOD, PyGOD, and TSLib into a unified workflow. Experiments demonstrate that AD-AGENT produces reliable scripts and recommends competitive models across libraries. The system is open-sourced to support further research and practical applications in AD.
Search
Fix author
Co-authors
- Tiankai Yang 5
- Xiyang Hu 4
- Li Li 4
- Ryan A. Rossi 3
- Nesreen K. Ahmed 2
- Hongjie Chen 2
- Franck Dernoncourt 2
- Kaize Ding 2
- Yushun Dong 2
- Jiuxiang Gu 2
- Sungchul Kim 2
- Branislav Kveton 2
- Yuangang Li 2
- Jiaqi Li 2
- Nedim Lipka 2
- Puneet Mathur 2
- Subhojyoti Mukherjee 2
- Yi Nian 2
- Namyong Park 2
- Yu Wang 2
- Zichao Wang 2
- Junda Wu 2
- Yu Xia 2
- Zhuo Xiao 2
- Ruiyao Xu 2
- Seunghyun Yoon 2
- Tong Yu 2
- Ruiyi Zhang 2
- Ryan Aponte 1
- Joe Barrow 1
- Samyadeep Basu 1
- Averi Bates 1
- Trung Bui 1
- Defu Cao 1
- Shayok Chakraborty 1
- Xiang Chen 1
- Xiuying Chen 1
- Cheng Cheng 1
- Hanieh Deilamsalehy 1
- Tyler Derr 1
- Chujie Gao 1
- Lisa Hightow-Weidman 1
- Zhengyu Hu 1
- Zhengmian Hu 1
- Xia Hu 1
- Ting-Hao Huang 1
- Yue Huang 1
- Hao Li 1
- Xintong Li 1
- Lincan Li 1
- Yuan Liang 1
- Zihao Lin 1
- Jinbo Liu 1
- Yan Liu 1
- Junjun Liu 1
- Reuben Luera 1
- Jiebo Luo 1
- Yingtao Luo 1
- Hanjia Lyu 1
- Julian McAuley 1
- Koyel Mukherjee 1
- Thien Huu Nguyen 1
- Zhiyu Ni 1
- Bo Ni 1
- Soumyabrata Pal 1
- Chongle Pan 1
- Mihir Parmar 1
- Charith Peris 1
- Zhuangzhuang Qian 1
- Yuehan Qin 1
- Jiashu Qu 1
- Md Shamim Seraj 1
- Ruosi Shao 1
- Bolin Shen 1
- Alexa Siu 1
- Michael Siu 1
- Linxin Song 1
- Chanjuan Song 1
- Tianyao Su 1
- Zhengzhong Tu 1
- Yanbo Wang 1
- Leyao Wang 1
- Jiahang Wang 1
- Yifei Wei 1
- Siyuan Wu 1
- Chaowei Xiao 1
- Zhouhang Xie 1
- Jiayi Ye 1
- Chenxiao Yu 1
- Zhehao Zhang 1
- Xiangliang Zhang 1
- Huixin Zhang 1
- Kangyi Zhao 1
- Yuxiao Zhou 1