Michał Brzozowski
2025
Representation-based Broad Hallucination Detectors Fail to Generalize Out of Distribution
Zuzanna Dubanowska
|
Maciej Żelaszczyk
|
Michał Brzozowski
|
Paolo Mandica
|
Michal P. Karpowicz
Findings of the Association for Computational Linguistics: EMNLP 2025
We critically assess the efficacy of the current SOTA in hallucination detection and find that its performance on the RAGTruth dataset is largely driven by a spurious correlation with data. Controlling for this effect, state-of-the-art performs no better than supervised linear probes, while requiring extensive hyperparameter tuning across datasets. Out-of-distribution generalization is currently out of reach, with all of the analyzed methods performing close to random. We propose a set of guidelines for hallucination detection and its evaluation.