Michał Brzozowski

2025

pdf bib abs
Representation-based Broad Hallucination Detectors Fail to Generalize Out of Distribution
Zuzanna Dubanowska | Maciej Żelaszczyk | Michał Brzozowski | Paolo Mandica | Michal P. Karpowicz
Findings of the Association for Computational Linguistics: EMNLP 2025

We critically assess the efficacy of the current SOTA in hallucination detection and find that its performance on the RAGTruth dataset is largely driven by a spurious correlation with data. Controlling for this effect, state-of-the-art performs no better than supervised linear probes, while requiring extensive hyperparameter tuning across datasets. Out-of-distribution generalization is currently out of reach, with all of the analyzed methods performing close to random. We propose a set of guidelines for hallucination detection and its evaluation.

Co-authors

Zuzanna Dubanowska 1
Michal P. Karpowicz 1
Paolo Mandica 1
Maciej Żelaszczyk 1

Venues

findings1

Fix author