IRIT-Berger-Levrault at SemEval-2024: How Sensitive Sentence Embeddings are to Hallucinations?

Nihed Bendahman, Karen Pinel-sauvagnat, Gilles Hubert, Mokhtar Billami


Abstract
This article presents our participation to Task 6 of SemEval-2024, named SHROOM (a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes), which aims at detecting hallucinations. We propose two types of approaches for the task: the first one is based on sentence embeddings and cosine similarity metric, and the second one uses LLMs (Large Language Model). We found that LLMs fail to improve the performance achieved by embedding generation models. The latter outperform the baseline provided by the organizers, and our best system achieves 78% accuracy.
Anthology ID:
2024.semeval-1.86
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
573–578
Language:
URL:
https://aclanthology.org/2024.semeval-1.86
DOI:
Bibkey:
Cite (ACL):
Nihed Bendahman, Karen Pinel-sauvagnat, Gilles Hubert, and Mokhtar Billami. 2024. IRIT-Berger-Levrault at SemEval-2024: How Sensitive Sentence Embeddings are to Hallucinations?. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 573–578, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
IRIT-Berger-Levrault at SemEval-2024: How Sensitive Sentence Embeddings are to Hallucinations? (Bendahman et al., SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.semeval-1.86.pdf
Supplementary material:
 2024.semeval-1.86.SupplementaryMaterial.txt