NU-RU at SemEval-2024 Task 6: Hallucination and Related Observable Overgeneration Mistake Detection Using Hypothesis-Target Similarity and SelfCheckGPT

Thanet Markchom; Subin Jung; Huizhi Liang

doi:10.18653/v1/2024.semeval-1.39

NU-RU at SemEval-2024 Task 6: Hallucination and Related Observable Overgeneration Mistake Detection Using Hypothesis-Target Similarity and SelfCheckGPT

Thanet Markchom, Subin Jung, Huizhi Liang

Abstract

One of the key challenges in Natural Language Generation (NLG) is “hallucination,” in which the generated output appears fluent and grammatically sound but may contain incorrect information. To address this challenge, “SemEval-2024 Task 6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes” is introduced. This task focuses on detecting overgeneration hallucinations in texts generated from Large Language Models for various NLG tasks. To tackle this task, this paper proposes two methods: (1) hypothesis-target similarity, which measures text similarity between a generated text (hypothesis) and an intended reference text (target), and (2) a SelfCheckGPT-based method to assess hallucinations via predefined prompts designed for different NLG tasks. Experiments were conducted on the dataset provided in this task. The results show that both of the proposed methods can effectively detect hallucinations in LLM-generated texts with a possibility for improvement.

Anthology ID:: 2024.semeval-1.39
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 253–260
Language:
URL:: https://aclanthology.org/2024.semeval-1.39
DOI:: 10.18653/v1/2024.semeval-1.39
Bibkey:
Cite (ACL):: Thanet Markchom, Subin Jung, and Huizhi Liang. 2024. NU-RU at SemEval-2024 Task 6: Hallucination and Related Observable Overgeneration Mistake Detection Using Hypothesis-Target Similarity and SelfCheckGPT. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 253–260, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: NU-RU at SemEval-2024 Task 6: Hallucination and Related Observable Overgeneration Mistake Detection Using Hypothesis-Target Similarity and SelfCheckGPT (Markchom et al., SemEval 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.39.pdf
Supplementary material:: 2024.semeval-1.39.SupplementaryMaterial.txt

PDF Search Supplementary material