Pouya Fallah


2024

pdf
SLPL SHROOM at SemEval2024 Task 06 : A comprehensive study on models ability to detect hallucination
Pouya Fallah | Soroush Gooran | Mohammad Jafarinasab | Pouya Sadeghi | Reza Farnia | Amirreza Tarabkhah | Zeinab Sadat Taghavi | Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Language models, particularly generative models, are susceptible to hallucinations, generating outputs that contradict factual knowledgeor the source text. This study explores methodsfor detecting hallucinations in three SemEval2024 Task 6 tasks: Machine Translation, Definition Modeling, and Paraphrase Generation.We evaluate two methods: semantic similaritybetween the generated text and factual references, and an ensemble of language modelsthat judge each other’s outputs. Our resultsshow that semantic similarity achieves moderate accuracy and correlation scores in trial data,while the ensemble method offers insights intothe complexities of hallucination detection butfalls short of expectations. This work highlights the challenges of hallucination detectionand underscores the need for further researchin this critical area.