Yue Zhao
Other people with similar names: Yue Zhao, Yue Zhao, Yue Zhao
Unverified author pages with similar names: Yue Zhao
2026
A Survey on LLM-based Conversational User Simulation
Bo Ni | Yu Wang | Leyao Wang | Branislav Kveton | Franck Dernoncourt | Yu Xia | Hongjie Chen | Reuben Luera | Samyadeep Basu | Subhojyoti Mukherjee | Puneet Mathur | Nesreen K. Ahmed | Junda Wu | Li Li | Huixin Zhang | Ruiyi Zhang | Tong Yu | Sungchul Kim | Jiuxiang Gu | Zhengzhong Tu | Alexa Siu | Zichao Wang | Seunghyun Yoon | Nedim Lipka | Namyong Park | Zihao Lin | Trung Bui | Yue Zhao | Tyler Derr | Ryan A. Rossi
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Bo Ni | Yu Wang | Leyao Wang | Branislav Kveton | Franck Dernoncourt | Yu Xia | Hongjie Chen | Reuben Luera | Samyadeep Basu | Subhojyoti Mukherjee | Puneet Mathur | Nesreen K. Ahmed | Junda Wu | Li Li | Huixin Zhang | Ruiyi Zhang | Tong Yu | Sungchul Kim | Jiuxiang Gu | Zhengzhong Tu | Alexa Siu | Zichao Wang | Seunghyun Yoon | Nedim Lipka | Namyong Park | Zihao Lin | Trung Bui | Yue Zhao | Tyler Derr | Ryan A. Rossi
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
User simulation has long played a vital role in computer science due to its potential to support a wide range of applications. Language, as the primary medium of human communication, forms the foundation of social interaction and behavior. Consequently, simulating conversational behavior has become a key area of study. Recent advancements in large language models (LLMs) have significantly catalyzed progress in this domain by enabling high-fidelity generation of synthetic user conversation. In this paper, we survey recent advancements in LLM-based conversational user simulation. We introduce a novel taxonomy covering user granularity and simulation objectives. Additionally, we systematically analyze core techniques and evaluation methodologies. We aim to keep the research community informed of the latest advancements in conversational user simulation and to further facilitate future research by identifying open challenges and organizing existing work under a unified framework.
2025
From Selection to Generation: A Survey of LLM-based Active Learning
Yu Xia | Subhojyoti Mukherjee | Zhouhang Xie | Junda Wu | Xintong Li | Ryan Aponte | Hanjia Lyu | Joe Barrow | Hongjie Chen | Franck Dernoncourt | Branislav Kveton | Tong Yu | Ruiyi Zhang | Jiuxiang Gu | Nesreen K. Ahmed | Yu Wang | Xiang Chen | Hanieh Deilamsalehy | Sungchul Kim | Zhengmian Hu | Yue Zhao | Nedim Lipka | Seunghyun Yoon | Ting-Hao Kenneth Huang | Zichao Wang | Puneet Mathur | Soumyabrata Pal | Koyel Mukherjee | Zhehao Zhang | Namyong Park | Thien Huu Nguyen | Jiebo Luo | Ryan A. Rossi | Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yu Xia | Subhojyoti Mukherjee | Zhouhang Xie | Junda Wu | Xintong Li | Ryan Aponte | Hanjia Lyu | Joe Barrow | Hongjie Chen | Franck Dernoncourt | Branislav Kveton | Tong Yu | Ruiyi Zhang | Jiuxiang Gu | Nesreen K. Ahmed | Yu Wang | Xiang Chen | Hanieh Deilamsalehy | Sungchul Kim | Zhengmian Hu | Yue Zhao | Nedim Lipka | Seunghyun Yoon | Ting-Hao Kenneth Huang | Zichao Wang | Puneet Mathur | Soumyabrata Pal | Koyel Mukherjee | Zhehao Zhang | Namyong Park | Thien Huu Nguyen | Jiebo Luo | Ryan A. Rossi | Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.
AD-LLM: Benchmarking Large Language Models for Anomaly Detection
Tiankai Yang | Yi Nian | Li Li | Ruiyao Xu | Yuangang Li | Jiaqi Li | Zhuo Xiao | Xiyang Hu | Ryan A. Rossi | Kaize Ding | Xia Hu | Yue Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Tiankai Yang | Yi Nian | Li Li | Ruiyao Xu | Yuangang Li | Jiaqi Li | Zhuo Xiao | Xiyang Hu | Ryan A. Rossi | Kaize Ding | Xia Hu | Yue Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Anomaly detection (AD) is an important machine learning task with many real-world uses, including fraud detection, medical diagnosis, and industrial monitoring. Within natural language processing (NLP), AD helps detect issues like spam, misinformation, and unusual user activity. Although large language models (LLMs) have had a strong impact on tasks such as text generation and summarization, their potential in AD has not been studied enough. This paper introduces AD-LLM, the first benchmark that evaluates how LLMs can help with NLP anomaly detection. We examine three key tasks: (i) zero-shot detection, using LLMs’ pre-trained knowledge to perform AD without tasks-specific training; (ii) data augmentation, generating synthetic data and category descriptions to improve AD models; and (iii) model selection, using LLMs to suggest unsupervised AD models. Through experiments with different datasets, we find that LLMs can work well in zero-shot AD, that carefully designed augmentation methods are useful, and that explaining model selection for specific datasets remains challenging. Based on these results, we outline six future research directions on LLMs for AD.
NLP-ADBench: NLP Anomaly Detection Benchmark
Yuangang Li | Jiaqi Li | Zhuo Xiao | Tiankai Yang | Yi Nian | Xiyang Hu | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Yuangang Li | Jiaqi Li | Zhuo Xiao | Tiankai Yang | Yi Nian | Xiyang Hu | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Anomaly detection (AD) is an important machine learning task with applications in fraud detection, content moderation, and user behavior analysis. However, AD is relatively understudied in a natural language processing (NLP) context, limiting its effectiveness in detecting harmful content, phishing attempts, and spam reviews. We introduce NLP-ADBench, the most comprehensive NLP anomaly detection (NLP-AD) benchmark to date, which includes eight curated datasets and 19 state-of-the-art algorithms. These span 3 end-to-end methods and 16 two-step approaches that adapt classical, non-AD methods to language embeddings from BERT and OpenAI. Our empirical results show that no single model dominates across all datasets, indicating a need for automated model selection. Moreover, two-step methods with transformer-based embeddings consistently outperform specialized end-to-end approaches, with OpenAI embeddings outperforming those of BERT. We release NLP-ADBench at https://github.com/USC-FORTIS/NLP-ADBench, providing a unified framework for NLP-AD and supporting future investigations.
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li | Jiashu Qu | Linxin Song | Yuxiao Zhou | Yuehan Qin | Tiankai Yang | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Li Li | Jiashu Qu | Linxin Song | Yuxiao Zhou | Yuehan Qin | Tiankai Yang | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Vision-Language Models (VLMs) excel at tasks such as image captioning and visual question answering but frequently produce hallucinated outputs that deviate from the actual visual input or prompt. While prior work links hallucination to biases in data or representation, their causal origins remain unclear. We propose a causal framework to analyze and mitigate hallucination in VLMs. Our key hypothesis is that hallucinations arise from unintended direct influences of the vision or text modality that bypass the intended multi-modal fusion. To examine this, we construct a causal graph of the VLM and use counterfactual analysis to estimate the Natural Direct Effect (NDE) of each modality and their interaction. By systematically identifying and suppressing these direct effects, we encourage outputs that are more faithfully grounded in true cross-modal reasoning. Our approach consists of three steps: (1) designing structural causal graphs to distinguish correct fusion pathways from spurious modality shortcuts, (2) estimating modality-specific and cross-modal NDE using perturbed image representations, hallucinated text embeddings, and degraded visual inputs, and (3) implementing a test-time intervention module to dynamically adjust the model’s dependence on each modality. Experimental results demonstrate that our method significantly reduces hallucination while preserving task performance, providing a robust and interpretable framework for improving VLM reliability.
AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection
Tiankai Yang | Junjun Liu | Michael Siu | Jiahang Wang | Zhuangzhuang Qian | Chanjuan Song | Cheng Cheng | Xiyang Hu | Yue Zhao
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Tiankai Yang | Junjun Liu | Michael Siu | Jiahang Wang | Zhuangzhuang Qian | Chanjuan Song | Cheng Cheng | Xiyang Hu | Yue Zhao
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Anomaly detection (AD) is essential in areas such as fraud detection, network monitoring, and scientific research. However, the diversity of data modalities and the increasing number of specialized AD libraries pose challenges for non-expert users who lack in-depth library-specific knowledge and advanced programming skills. To tackle this, we present AD-AGENT, an LLM-driven multi-agent framework that turns natural-language instructions into fully executable AD pipelines. AD-AGENT coordinates specialized agents for intent parsing, data preparation, library and model selection, documentation mining, and iterative code generation and debugging. Using a shared short-term workspace and a long-term cache, the agents integrate popular AD libraries like PyOD, PyGOD, and TSLib into a unified workflow. Experiments demonstrate that AD-AGENT produces reliable scripts and recommends competitive models across libraries. The system is open-sourced to support further research and practical applications in AD.
LLM-Empowered Patient-Provider Communication: A Data-Centric Survey From a Clinical Perspective
Ruosi Shao | Md Shamim Seraj | Kangyi Zhao | Yingtao Luo | Lincan Li | Bolin Shen | Averi Bates | Yue Zhao | Chongle Pan | Lisa Hightow-Weidman | Shayok Chakraborty | Yushun Dong
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Ruosi Shao | Md Shamim Seraj | Kangyi Zhao | Yingtao Luo | Lincan Li | Bolin Shen | Averi Bates | Yue Zhao | Chongle Pan | Lisa Hightow-Weidman | Shayok Chakraborty | Yushun Dong
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Large language models (LLMs) hold promise for advancing patient–provider communication, yet a persistent gap remains between benchmark-driven model development and the realities of clinical practice. This work presents a systematic, clinically grounded review of text-based medical datasets for LLM training and evaluation. We propose a scenario-based taxonomy derived from established clinical frameworks to map major knowledge-based and conversation-based corpora against core communication scenarios. We further synthesize core communication skills from gold-standard clinical assessment instruments and meta-analyze state-of-the-art medical LLM performance, highlighting how dataset properties, fine-tuning strategies, and evaluation metrics shape both knowledge acquisition and communicative competence. To empirically validate these findings, we conducted controlled fine-tuning experiments across representative LLMs, demonstrating that data composition and scenario alignment critically affect model performance. Our findings highlight the urgent need for scenario-rich datasets and standardized, human-centered evaluation protocol to advance clinically relevant medical LLMs.
TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models
Yanbo Wang | Jiayi Ye | Siyuan Wu | Chujie Gao | Yue Huang | Xiuying Chen | Yue Zhao | Xiangliang Zhang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Yanbo Wang | Jiayi Ye | Siyuan Wu | Chujie Gao | Yue Huang | Xiuying Chen | Yue Zhao | Xiangliang Zhang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Ensuring the trustworthiness of Generative Foundation Models (GenFMs) is a pressing challenge as they gain widespread use. Existing evaluation toolkits are often limited in scope, dynamism, and flexibility. This paper introduces TRUSTEVAL, a dynamic and comprehensive toolkit designed for evaluating GenFMs across various dimensions. TRUSTEVAL supports both dynamic dataset generation and evaluation, offering advanced features including comprehensiveness, usability, and flexibility. TRUSTEVAL integrates diverse generative models, datasets, evaluation methods, metrics, inference efficiency enhancement, and evaluation report generation. Through case studies, we demonstrate TRUSTEVAL’s potential to advance the trustworthiness evaluation of GenFMs.
Search
Fix author
Co-authors
- Tiankai Yang 4
- Xiyang Hu 3
- Li Li 3
- Ryan A. Rossi 3
- Nesreen K. Ahmed 2
- Hongjie Chen 2
- Franck Dernoncourt 2
- Jiuxiang Gu 2
- Sungchul Kim 2
- Branislav Kveton 2
- Yuangang Li 2
- Jiaqi Li 2
- Nedim Lipka 2
- Puneet Mathur 2
- Subhojyoti Mukherjee 2
- Yi Nian 2
- Namyong Park 2
- Yu Wang 2
- Zichao Wang 2
- Junda Wu 2
- Yu Xia 2
- Zhuo Xiao 2
- Seunghyun Yoon 2
- Tong Yu 2
- Ruiyi Zhang 2
- Ryan Aponte 1
- Joe Barrow 1
- Samyadeep Basu 1
- Averi Bates 1
- Trung Bui 1
- Shayok Chakraborty 1
- Xiang Chen 1
- Xiuying Chen 1
- Cheng Cheng 1
- Hanieh Deilamsalehy 1
- Tyler Derr 1
- Kaize Ding 1
- Yushun Dong 1
- Chujie Gao 1
- Lisa Hightow-Weidman 1
- Zhengmian Hu 1
- Xia Hu 1
- Ting-Hao Huang 1
- Yue Huang 1
- Xintong Li 1
- Lincan Li 1
- Zihao Lin 1
- Junjun Liu 1
- Reuben Luera 1
- Jiebo Luo 1
- Yingtao Luo 1
- Hanjia Lyu 1
- Julian McAuley 1
- Koyel Mukherjee 1
- Thien Huu Nguyen 1
- Bo Ni 1
- Soumyabrata Pal 1
- Chongle Pan 1
- Zhuangzhuang Qian 1
- Yuehan Qin 1
- Jiashu Qu 1
- Md Shamim Seraj 1
- Ruosi Shao 1
- Bolin Shen 1
- Michael Siu 1
- Alexa Siu 1
- Linxin Song 1
- Chanjuan Song 1
- Zhengzhong Tu 1
- Jiahang Wang 1
- Yanbo Wang 1
- Leyao Wang 1
- Siyuan Wu 1
- Zhouhang Xie 1
- Ruiyao Xu 1
- Jiayi Ye 1
- Zhehao Zhang 1
- Xiangliang Zhang 1
- Huixin Zhang 1
- Kangyi Zhao 1
- Yuxiao Zhou 1