Yarin Gal

2025

pdf bib abs
Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions
Hazel Kim | Tom A. Lamb | Adel Bibi | Philip Torr | Yarin Gal
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) frequently generate confident yet inaccurate responses, introducing significant risks for deployment in safety-critical domains. We present a novel, test-time approach to detecting model hallucination through systematic analysis of information flow across model layers. We target cases when LLMs process inputs with ambiguous or insufficient context. Our investigation reveals that hallucination manifests as usable information deficiencies in inter-layer transmissions. While existing approaches primarily focus on final-layer output analysis, we demonstrate that tracking cross-layer information dynamics (ℒI) provides robust indicators of model reliability, accounting for both information gain and loss during computation. I improves model reliability by immediately integrating with universal LLMs without additional training or architectural modifications.

pdf bib abs
Simple Factuality Probes Detect Hallucinations in Long-Form Natural Language Generation
Jiatong Han | Neil Band | Muhammed Razzak | Jannik Kossen | Tim G. J. Rudner | Yarin Gal
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) often mislead users with confident hallucinations. Current approaches to detect hallucination require many samples from the LLM generator, which is computationally infeasible as frontier model sizes and generation lengths continue to grow. We present a remarkably simple baseline for detecting hallucinations in long-form LLM generations, with performance comparable to expensive multi-sample approaches while drawing only a single sample from the LLM generator. Our key finding is that LLM hidden states are highly predictive of factuality in long-form natural language generation and that this information can be efficiently extracted at inference time using a lightweight probe. We benchmark a variety of long-form hallucination detection methods across open-weight models up to 405B parameters and demonstrate that our approach achieves competitive performance with up to 100x fewer FLOPs. Furthermore, our probes generalize to out-of-distribution model outputs, evaluated using hidden states of smaller open-source models. Our results demonstrate the promise of hidden state probes in detecting long-form LLM hallucinations.

2023

pdf bib abs
Revisiting Automated Prompting: Are We Actually Doing Better?
Yulin Zhou | Yiren Zhao | Ilia Shumailov | Robert Mullins | Yarin Gal
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Current literature demonstrates that Large Language Models (LLMs) are great few-shot learners, and prompting significantly increases their performance on a range of downstream tasks in a few-shot learning setting. An attempt to automate human-led prompting followed, with some progress achieved. In particular, subsequent work demonstrates that automation can outperform fine-tuning in certain K-shot learning scenarios. In this paper, we revisit techniques for automated prompting on six different downstream tasks and a larger range of K-shot learning settings. We find that automated prompting does not consistently outperform simple manual prompting. Our work suggests that, in addition to fine-tuning, manual prompting should be used as a baseline in this line of research.

Yarin Gal

2025

2023

2013

Co-authors

Venues