2025
pdf
bib
abs
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
Qing Li
|
Jiahui Geng
|
Zongxiong Chen
|
Derui Zhu
|
Yuxia Wang
|
Congbo Ma
|
Chenyang Lyu
|
Fakhri Karray
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In recent years, large language models (LLMs) have made remarkable advancements, yet hallucination, where models produce inaccurate or non-factual statements, remains a significant challenge for real-world deployment. Although current classification-based methods, such as SAPLMA, are highly efficient in mitigating hallucinations, they struggle when non-factual information arises in the early or mid-sequence of outputs, reducing their reliability. To address these issues, we propose Hallucination Detection-Neural Differential Equations (HD-NDEs), a novel method that systematically assesses the truthfulness of statements by capturing the full dynamics of LLMs within their latent space. Our approaches apply neural differential equations (Neural DEs) to model the dynamic system in the latent space of LLMs. Then, the sequence in the latent space is mapped to the classification space for truth assessment. The extensive experiments across five datasets and six widely used LLMs demonstrate the effectiveness of HD-NDEs, especially, achieving over 14% improvement in AUC-ROC on the True-False dataset compared to state-of-the-art techniques.
pdf
bib
abs
VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Jiahui Geng
|
Qing Li
|
Zongxiong Chen
|
Yuxia Wang
|
Derui Zhu
|
Zhuohan Xie
|
Chenyang Lyu
|
Xiuying Chen
|
Preslav Nakov
|
Fakhri Karray
Findings of the Association for Computational Linguistics: ACL 2025
The rapid advancement of vision-language models (VLMs) has brought a lot of attention to their safety alignment. However, existing methods have primarily focused on model undersafety, where the model responds to hazardous queries, while neglecting oversafety, where the model refuses to answer safe queries. In this paper, we introduce the concept of safety calibration, which systematically addresses both undersafety and oversafety. Specifically, we present VSCBench, a novel dataset of 3,600 image-text pairs that are visually or textually similar but differ in terms of safety, which is designed to evaluate safety calibration across image-centric and text-centric scenarios. Based on our benchmark, we evaluate safety calibration across eleven widely used VLMs. Our extensive experiments revealed major issues with both undersafety and oversafety. We further investigated four approaches to improve the model’s safety calibration. We found that even though some methods effectively calibrated the models’ safety problems, these methods also lead to the degradation of models’ utility. This trade-off underscores the urgent need for advanced calibration methods, and our benchmark provides a valuable tool for evaluating future approaches.
2024
pdf
bib
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Derui Zhu
|
Dingfan Chen
|
Qing Li
|
Zongxiong Chen
|
Lei Ma
|
Jens Grossklags
|
Mario Fritz
Findings of the Association for Computational Linguistics: NAACL 2024