Abstract
One of the key challenges in Natural Language Generation (NLG) is “hallucination,” in which the generated output appears fluent and grammatically sound but may contain incorrect information. To address this challenge, “SemEval-2024 Task 6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes” is introduced. This task focuses on detecting overgeneration hallucinations in texts generated from Large Language Models for various NLG tasks. To tackle this task, this paper proposes two methods: (1) hypothesis-target similarity, which measures text similarity between a generated text (hypothesis) and an intended reference text (target), and (2) a SelfCheckGPT-based method to assess hallucinations via predefined prompts designed for different NLG tasks. Experiments were conducted on the dataset provided in this task. The results show that both of the proposed methods can effectively detect hallucinations in LLM-generated texts with a possibility for improvement.- Anthology ID:
- 2024.semeval-1.39
- Volume:
- Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 253–260
- Language:
- URL:
- https://aclanthology.org/2024.semeval-1.39
- DOI:
- 10.18653/v1/2024.semeval-1.39
- Cite (ACL):
- Thanet Markchom, Subin Jung, and Huizhi Liang. 2024. NU-RU at SemEval-2024 Task 6: Hallucination and Related Observable Overgeneration Mistake Detection Using Hypothesis-Target Similarity and SelfCheckGPT. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 253–260, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- NU-RU at SemEval-2024 Task 6: Hallucination and Related Observable Overgeneration Mistake Detection Using Hypothesis-Target Similarity and SelfCheckGPT (Markchom et al., SemEval 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.39.pdf