Yue Zhao

Other people with similar names: Yue Zhao, Yue Zhao

Unverified author pages with similar names: Yue Zhao


2025

pdf bib
From Selection to Generation: A Survey of LLM-based Active Learning
Yu Xia | Subhojyoti Mukherjee | Zhouhang Xie | Junda Wu | Xintong Li | Ryan Aponte | Hanjia Lyu | Joe Barrow | Hongjie Chen | Franck Dernoncourt | Branislav Kveton | Tong Yu | Ruiyi Zhang | Jiuxiang Gu | Nesreen K. Ahmed | Yu Wang | Xiang Chen | Hanieh Deilamsalehy | Sungchul Kim | Zhengmian Hu | Yue Zhao | Nedim Lipka | Seunghyun Yoon | Ting-Hao Kenneth Huang | Zichao Wang | Puneet Mathur | Soumyabrata Pal | Koyel Mukherjee | Zhehao Zhang | Namyong Park | Thien Huu Nguyen | Jiebo Luo | Ryan A. Rossi | Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.

pdf bib
AD-LLM: Benchmarking Large Language Models for Anomaly Detection
Tiankai Yang | Yi Nian | Li Li | Ruiyao Xu | Yuangang Li | Jiaqi Li | Zhuo Xiao | Xiyang Hu | Ryan A. Rossi | Kaize Ding | Xia Hu | Yue Zhao
Findings of the Association for Computational Linguistics: ACL 2025

Anomaly detection (AD) is an important machine learning task with many real-world uses, including fraud detection, medical diagnosis, and industrial monitoring. Within natural language processing (NLP), AD helps detect issues like spam, misinformation, and unusual user activity. Although large language models (LLMs) have had a strong impact on tasks such as text generation and summarization, their potential in AD has not been studied enough. This paper introduces AD-LLM, the first benchmark that evaluates how LLMs can help with NLP anomaly detection. We examine three key tasks: (i) zero-shot detection, using LLMs’ pre-trained knowledge to perform AD without tasks-specific training; (ii) data augmentation, generating synthetic data and category descriptions to improve AD models; and (iii) model selection, using LLMs to suggest unsupervised AD models. Through experiments with different datasets, we find that LLMs can work well in zero-shot AD, that carefully designed augmentation methods are useful, and that explaining model selection for specific datasets remains challenging. Based on these results, we outline six future research directions on LLMs for AD.

pdf bib
NLP-ADBench: NLP Anomaly Detection Benchmark
Yuangang Li | Jiaqi Li | Zhuo Xiao | Tiankai Yang | Yi Nian | Xiyang Hu | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025

Anomaly detection (AD) is an important machine learning task with applications in fraud detection, content moderation, and user behavior analysis. However, AD is relatively understudied in a natural language processing (NLP) context, limiting its effectiveness in detecting harmful content, phishing attempts, and spam reviews. We introduce NLP-ADBench, the most comprehensive NLP anomaly detection (NLP-AD) benchmark to date, which includes eight curated datasets and 19 state-of-the-art algorithms. These span 3 end-to-end methods and 16 two-step approaches that adapt classical, non-AD methods to language embeddings from BERT and OpenAI. Our empirical results show that no single model dominates across all datasets, indicating a need for automated model selection. Moreover, two-step methods with transformer-based embeddings consistently outperform specialized end-to-end approaches, with OpenAI embeddings outperforming those of BERT. We release NLP-ADBench at https://github.com/USC-FORTIS/NLP-ADBench, providing a unified framework for NLP-AD and supporting future investigations.

pdf bib
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li | Jiashu Qu | Linxin Song | Yuxiao Zhou | Yuehan Qin | Tiankai Yang | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025

Vision-Language Models (VLMs) excel at tasks such as image captioning and visual question answering but frequently produce hallucinated outputs that deviate from the actual visual input or prompt. While prior work links hallucination to biases in data or representation, their causal origins remain unclear. We propose a causal framework to analyze and mitigate hallucination in VLMs. Our key hypothesis is that hallucinations arise from unintended direct influences of the vision or text modality that bypass the intended multi-modal fusion. To examine this, we construct a causal graph of the VLM and use counterfactual analysis to estimate the Natural Direct Effect (NDE) of each modality and their interaction. By systematically identifying and suppressing these direct effects, we encourage outputs that are more faithfully grounded in true cross-modal reasoning. Our approach consists of three steps: (1) designing structural causal graphs to distinguish correct fusion pathways from spurious modality shortcuts, (2) estimating modality-specific and cross-modal NDE using perturbed image representations, hallucinated text embeddings, and degraded visual inputs, and (3) implementing a test-time intervention module to dynamically adjust the model’s dependence on each modality. Experimental results demonstrate that our method significantly reduces hallucination while preserving task performance, providing a robust and interpretable framework for improving VLM reliability.

pdf bib
AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection
Tiankai Yang | Junjun Liu | Michael Siu | Jiahang Wang | Zhuangzhuang Qian | Chanjuan Song | Cheng Cheng | Xiyang Hu | Yue Zhao
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Anomaly detection (AD) is essential in areas such as fraud detection, network monitoring, and scientific research. However, the diversity of data modalities and the increasing number of specialized AD libraries pose challenges for non-expert users who lack in-depth library-specific knowledge and advanced programming skills. To tackle this, we present AD-AGENT, an LLM-driven multi-agent framework that turns natural-language instructions into fully executable AD pipelines. AD-AGENT coordinates specialized agents for intent parsing, data preparation, library and model selection, documentation mining, and iterative code generation and debugging. Using a shared short-term workspace and a long-term cache, the agents integrate popular AD libraries like PyOD, PyGOD, and TSLib into a unified workflow. Experiments demonstrate that AD-AGENT produces reliable scripts and recommends competitive models across libraries. The system is open-sourced to support further research and practical applications in AD.

pdf bib
LLM-Empowered Patient-Provider Communication: A Data-Centric Survey From a Clinical Perspective
Ruosi Shao | Md Shamim Seraj | Kangyi Zhao | Yingtao Luo | Lincan Li | Bolin Shen | Averi Bates | Yue Zhao | Chongle Pan | Lisa Hightow-Weidman | Shayok Chakraborty | Yushun Dong
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Large language models (LLMs) hold promise for advancing patient–provider communication, yet a persistent gap remains between benchmark-driven model development and the realities of clinical practice. This work presents a systematic, clinically grounded review of text-based medical datasets for LLM training and evaluation. We propose a scenario-based taxonomy derived from established clinical frameworks to map major knowledge-based and conversation-based corpora against core communication scenarios. We further synthesize core communication skills from gold-standard clinical assessment instruments and meta-analyze state-of-the-art medical LLM performance, highlighting how dataset properties, fine-tuning strategies, and evaluation metrics shape both knowledge acquisition and communicative competence. To empirically validate these findings, we conducted controlled fine-tuning experiments across representative LLMs, demonstrating that data composition and scenario alignment critically affect model performance. Our findings highlight the urgent need for scenario-rich datasets and standardized, human-centered evaluation protocol to advance clinically relevant medical LLMs.