SLPL SHROOM at SemEval2024 Task 06 : A comprehensive study on models ability to detect hallucination

Pouya Fallah; Soroush Gooran; Mohammad Jafarinasab; Pouya Sadeghi; Reza Farnia; Amirreza Tarabkhah; Zeinab Sadat Taghavi; Hossein Sameti

SLPL SHROOM at SemEval2024 Task 06 : A comprehensive study on models ability to detect hallucination

Pouya Fallah, Soroush Gooran, Mohammad Jafarinasab, Pouya Sadeghi, Reza Farnia, Amirreza Tarabkhah, Zeinab Sadat Taghavi, Hossein Sameti

Abstract

Language models, particularly generative models, are susceptible to hallucinations, generating outputs that contradict factual knowledgeor the source text. This study explores methodsfor detecting hallucinations in three SemEval2024 Task 6 tasks: Machine Translation, Definition Modeling, and Paraphrase Generation.We evaluate two methods: semantic similaritybetween the generated text and factual references, and an ensemble of language modelsthat judge each other’s outputs. Our resultsshow that semantic similarity achieves moderate accuracy and correlation scores in trial data,while the ensemble method offers insights intothe complexities of hallucination detection butfalls short of expectations. This work highlights the challenges of hallucination detectionand underscores the need for further researchin this critical area.

Anthology ID:: 2024.semeval-1.167
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1148–1154
Language:
URL:: https://aclanthology.org/2024.semeval-1.167
DOI:
Bibkey:
Cite (ACL):: Pouya Fallah, Soroush Gooran, Mohammad Jafarinasab, Pouya Sadeghi, Reza Farnia, Amirreza Tarabkhah, Zeinab Sadat Taghavi, and Hossein Sameti. 2024. SLPL SHROOM at SemEval2024 Task 06 : A comprehensive study on models ability to detect hallucination. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1148–1154, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: SLPL SHROOM at SemEval2024 Task 06 : A comprehensive study on models ability to detect hallucination (Fallah et al., SemEval 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-checklist/2024.semeval-1.167.pdf
Supplementary material:: 2024.semeval-1.167.SupplementaryMaterial.txt

PDF Search Supplementary material