IRIT-Berger-Levrault at SemEval-2024: How Sensitive Sentence Embeddings are to Hallucinations?
Nihed Bendahman, Karen Pinel-sauvagnat, Gilles Hubert, Mokhtar Billami
Abstract
This article presents our participation to Task 6 of SemEval-2024, named SHROOM (a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes), which aims at detecting hallucinations. We propose two types of approaches for the task: the first one is based on sentence embeddings and cosine similarity metric, and the second one uses LLMs (Large Language Model). We found that LLMs fail to improve the performance achieved by embedding generation models. The latter outperform the baseline provided by the organizers, and our best system achieves 78% accuracy.- Anthology ID:
- 2024.semeval-1.86
- Volume:
- Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 573–578
- Language:
- URL:
- https://aclanthology.org/2024.semeval-1.86
- DOI:
- 10.18653/v1/2024.semeval-1.86
- Cite (ACL):
- Nihed Bendahman, Karen Pinel-sauvagnat, Gilles Hubert, and Mokhtar Billami. 2024. IRIT-Berger-Levrault at SemEval-2024: How Sensitive Sentence Embeddings are to Hallucinations?. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 573–578, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- IRIT-Berger-Levrault at SemEval-2024: How Sensitive Sentence Embeddings are to Hallucinations? (Bendahman et al., SemEval 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.86.pdf