HalluRAG-RUG at SemEval-2025 Task 3: Using Retrieval-Augmented Generation for Hallucination Detection in Model Outputs
Silvana Abdi, Mahrokh Hassani, Rosalien Kinds, Timo Strijbis, Roman Terpstra
Abstract
Large Language Models (LLMs) suffer from a critical limitation: hallucinations, which refers to models generating fluent but factually incorrect text. This paper presents our approach to hallucination detection in English model outputs as part of the SemEval-2025 Task 3 (Mu-SHROOM). Our method, HalluRAG-RUG, integrates Retrieval-Augmented Generation (RAG) using Llama-3 and prediction models using token probabilities and semantic similarity. We retrieved relevant factual information using a named entity recognition (NER)-based Wikipedia search and applied abstractive summarization to refine the knowledge base. The hallucination detection pipeline then used this retrieved knowledge to identify inconsistent spans in model-generated text. This result was combined with the results of two systems which identified hallucinations based on token probabilities and low-similarity sentences. Our system placed 33rd out of 41, performing slightly below the ‘mark all’ baseline but surpassing the ‘mark none’ and ‘neural’ baselines with an IoU of 0.3093 and a correlation of 0.0833.- Anthology ID:
- 2025.semeval-1.116
- Volume:
- Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
- Venues:
- SemEval | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 846–851
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.116/
- DOI:
- Cite (ACL):
- Silvana Abdi, Mahrokh Hassani, Rosalien Kinds, Timo Strijbis, and Roman Terpstra. 2025. HalluRAG-RUG at SemEval-2025 Task 3: Using Retrieval-Augmented Generation for Hallucination Detection in Model Outputs. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 846–851, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- HalluRAG-RUG at SemEval-2025 Task 3: Using Retrieval-Augmented Generation for Hallucination Detection in Model Outputs (Abdi et al., SemEval 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.116.pdf