Yoshiko Kakita
2025
Overview of the SciHal25 Shared Task on Hallucination Detection for Scientific Content
Dan Li
|
Bogdan Palfi
|
Colin Zhang
|
Jaiganesh Subramanian
|
Adrian Raudaschl
|
Yoshiko Kakita
|
Anita De Waard
|
Zubair Afzal
|
Georgios Tsatsaronis
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
This paper provides an overview of the Hallucination Detection for Scientific Content (SciHal) shared task held in the 2025 ACL Scholarly Document Processing workshop. The task invites participants to detect hallucinated claims in answers to research-oriented questions generated by real-world GenAI-powered research assistants. This task is formulated as a multi-label classification problem, each instance consists of a question, an answer, an extracted claim, and supporting reference abstracts. Participants are asked to label claims under two subtasks: (1) coarse-grained detection with labels Entailment, Contradiction, or Unverifiable; and (2) fine-grained detection with a more detailed taxonomy including 8 types.The dataset consists of 500 research-oriented questions collected over one week from a generative assistant tool. These questions were rewritten using GPT-4o and manually reviewed to address potential privacy or commercial concerns. In total, 10,000 reference abstracts were retrieved, and 4,592 claims were extracted from the assistant’s answers. Each claim is annotated with hallucination labels. The dataset is divided into 3,592 training, 500 validation, and 500 test instances.Subtask 1 saw 88 submissions across 10 teams while subtask 2 saw 39 submissions across 6 teams, resulting in a total of 5 published technical reports. This paper summarizes the task design, dataset, participation, and key findings.
Search
Fix author
Co-authors
- Zubair Afzal 1
- Anita De Waard 1
- Dan Li 1
- Bogdan Palfi 1
- Adrian Raudaschl 1
- show all...