2025
pdf
bib
abs
From Selection to Generation: A Survey of LLM-based Active Learning
Yu Xia
|
Subhojyoti Mukherjee
|
Zhouhang Xie
|
Junda Wu
|
Xintong Li
|
Ryan Aponte
|
Hanjia Lyu
|
Joe Barrow
|
Hongjie Chen
|
Franck Dernoncourt
|
Branislav Kveton
|
Tong Yu
|
Ruiyi Zhang
|
Jiuxiang Gu
|
Nesreen K. Ahmed
|
Yu Wang
|
Xiang Chen
|
Hanieh Deilamsalehy
|
Sungchul Kim
|
Zhengmian Hu
|
Yue Zhao
|
Nedim Lipka
|
Seunghyun Yoon
|
Ting-Hao Kenneth Huang
|
Zichao Wang
|
Puneet Mathur
|
Soumyabrata Pal
|
Koyel Mukherjee
|
Zhehao Zhang
|
Namyong Park
|
Thien Huu Nguyen
|
Jiebo Luo
|
Ryan A. Rossi
|
Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.
pdf
bib
abs
AD-LLM: Benchmarking Large Language Models for Anomaly Detection
Tiankai Yang
|
Yi Nian
|
Li Li
|
Ruiyao Xu
|
Yuangang Li
|
Jiaqi Li
|
Zhuo Xiao
|
Xiyang Hu
|
Ryan A. Rossi
|
Kaize Ding
|
Xia Hu
|
Yue Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Anomaly detection (AD) is an important machine learning task with many real-world uses, including fraud detection, medical diagnosis, and industrial monitoring. Within natural language processing (NLP), AD helps detect issues like spam, misinformation, and unusual user activity. Although large language models (LLMs) have had a strong impact on tasks such as text generation and summarization, their potential in AD has not been studied enough. This paper introduces AD-LLM, the first benchmark that evaluates how LLMs can help with NLP anomaly detection. We examine three key tasks: (i) zero-shot detection, using LLMs’ pre-trained knowledge to perform AD without tasks-specific training; (ii) data augmentation, generating synthetic data and category descriptions to improve AD models; and (iii) model selection, using LLMs to suggest unsupervised AD models. Through experiments with different datasets, we find that LLMs can work well in zero-shot AD, that carefully designed augmentation methods are useful, and that explaining model selection for specific datasets remains challenging. Based on these results, we outline six future research directions on LLMs for AD.
pdf
bib
abs
NLP-ADBench: NLP Anomaly Detection Benchmark
Yuangang Li
|
Jiaqi Li
|
Zhuo Xiao
|
Tiankai Yang
|
Yi Nian
|
Xiyang Hu
|
Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Anomaly detection (AD) is an important machine learning task with applications in fraud detection, content moderation, and user behavior analysis. However, AD is relatively understudied in a natural language processing (NLP) context, limiting its effectiveness in detecting harmful content, phishing attempts, and spam reviews. We introduce NLP-ADBench, the most comprehensive NLP anomaly detection (NLP-AD) benchmark to date, which includes eight curated datasets and 19 state-of-the-art algorithms. These span 3 end-to-end methods and 16 two-step approaches that adapt classical, non-AD methods to language embeddings from BERT and OpenAI. Our empirical results show that no single model dominates across all datasets, indicating a need for automated model selection. Moreover, two-step methods with transformer-based embeddings consistently outperform specialized end-to-end approaches, with OpenAI embeddings outperforming those of BERT. We release NLP-ADBench at https://github.com/USC-FORTIS/NLP-ADBench, providing a unified framework for NLP-AD and supporting future investigations.
pdf
bib
abs
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li
|
Jiashu Qu
|
Linxin Song
|
Yuxiao Zhou
|
Yuehan Qin
|
Tiankai Yang
|
Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Vision-Language Models (VLMs) excel at tasks such as image captioning and visual question answering but frequently produce hallucinated outputs that deviate from the actual visual input or prompt. While prior work links hallucination to biases in data or representation, their causal origins remain unclear. We propose a causal framework to analyze and mitigate hallucination in VLMs. Our key hypothesis is that hallucinations arise from unintended direct influences of the vision or text modality that bypass the intended multi-modal fusion. To examine this, we construct a causal graph of the VLM and use counterfactual analysis to estimate the Natural Direct Effect (NDE) of each modality and their interaction. By systematically identifying and suppressing these direct effects, we encourage outputs that are more faithfully grounded in true cross-modal reasoning. Our approach consists of three steps: (1) designing structural causal graphs to distinguish correct fusion pathways from spurious modality shortcuts, (2) estimating modality-specific and cross-modal NDE using perturbed image representations, hallucinated text embeddings, and degraded visual inputs, and (3) implementing a test-time intervention module to dynamically adjust the model’s dependence on each modality. Experimental results demonstrate that our method significantly reduces hallucination while preserving task performance, providing a robust and interpretable framework for improving VLM reliability.