Maha Bhaashya at SemEval-2024 Task 6: Zero-Shot Multi-task Hallucination Detection
Patanjali Bhamidipati, Advaith Malladi, Manish Shrivastava, Radhika Mamidi
Abstract
In recent studies, the extensive utilization oflarge language models has underscored the importance of robust evaluation methodologiesfor assessing text generation quality and relevance to specific tasks. This has revealeda prevalent issue known as hallucination, anemergent condition in the model where generated text lacks faithfulness to the source anddeviates from the evaluation criteria. In thisstudy, we formally define hallucination and propose a framework for its quantitative detectionin a zero-shot setting, leveraging our definitionand the assumption that model outputs entailtask and sample specific inputs. In detectinghallucinations, our solution achieves an accuracy of 0.78 in a model-aware setting and 0.61in a model-agnostic setting. Notably, our solution maintains computational efficiency, requiring far less computational resources than other SOTA approaches, aligning with the trendtowards lightweight and compressed models.- Anthology ID:
- 2024.semeval-1.241
- Volume:
- Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1685–1689
- Language:
- URL:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.semeval-1.241/
- DOI:
- 10.18653/v1/2024.semeval-1.241
- Cite (ACL):
- Patanjali Bhamidipati, Advaith Malladi, Manish Shrivastava, and Radhika Mamidi. 2024. Maha Bhaashya at SemEval-2024 Task 6: Zero-Shot Multi-task Hallucination Detection. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1685–1689, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Maha Bhaashya at SemEval-2024 Task 6: Zero-Shot Multi-task Hallucination Detection (Bhamidipati et al., SemEval 2024)
- PDF:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.semeval-1.241.pdf