Natural Language Inference Fine-tuning for Scientific Hallucination Detection

Tim Schopf; Juraj Vladika; Michael Färber; Florian Matthes

doi:10.18653/v1/2025.sdp-1.33

Natural Language Inference Fine-tuning for Scientific Hallucination Detection

Tim Schopf, Juraj Vladika, Michael Färber, Florian Matthes

Abstract

Modern generative Large Language Models (LLMs) are capable of generating text that sounds coherent and convincing, but are also prone to producing hallucinations, facts that contradict the world knowledge. Even in the case of Retrieval-Augmented Generation (RAG) systems, where relevant context is first retrieved and passed in the input, the generated facts can contradict or not be verifiable by the provided references. This has motivated SciHal 2025, a shared task that focuses on the detection of hallucinations for scientific content. The two subtasks focused on: (1) predicting whether a claim from a generated LLM answer is entailed, contradicted, or unverifiable by the used references; (2) predicting a fine-grained category of erroneous claims. Our best performing approach used an ensemble of fine-tuned encoder-only ModernBERT and DeBERTa-v3 models for classification. Out of nine competing teams, our approach achieved the first place in sub-task 1 and the second place in sub-task 2.

Anthology ID:: 2025.sdp-1.33
Volume:: Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Tirthankar Ghosal, Philipp Mayr, Amanpreet Singh, Aakanksha Naik, Georg Rehm, Dayne Freitag, Dan Li, Sonja Schimmler, Anita De Waard
Venues:: sdp | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 344–352
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.sdp-1.33/
DOI:: 10.18653/v1/2025.sdp-1.33
Bibkey:
Cite (ACL):: Tim Schopf, Juraj Vladika, Michael Färber, and Florian Matthes. 2025. Natural Language Inference Fine-tuning for Scientific Hallucination Detection. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 344–352, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Natural Language Inference Fine-tuning for Scientific Hallucination Detection (Schopf et al., sdp 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.sdp-1.33.pdf

PDF Cite Search Fix data