Roman Garaev

2026

Efficient Hallucination Detection in Automatic Code Generation
Georgii Andriushchenko | Roman Garaev | Lyudmila Rvanova | Artem Shelmanov | Vladimir V. Ivanov
Findings of the Association for Computational Linguistics: ACL 2026

Large language models (LLMs) frequently produce source code that seems correct and well-formed, yet includes hallucinated elements that cause downstream test failures. In this study, we benchmark state-of-the-art uncertainty quantification methods and existing baselines for the task of hallucination detection in source code and introduce a diff-based pipeline to construct a code dataset annotated with line-level hallucinations. Building on this, we train a lightweight Transformer-based detector that uses LLM internal representations to identify hallucinations, substantially outperforming existing methods across several code generation domains. The detector also shows particular promise for enabling self-correction in LLM-based coding agents. We release the first publicly available dataset of line-level code hallucinations, along with the corresponding source code and trained hallucination detectors https://github.com/datapaf/CodeHallucinationDetection

Co-authors

Venues

Findings1

Fix author