Pollice Verso at SemEval-2024 Task 6: The Roman Empire Strikes Back

Konstantin Kobs; Jan Pfister; Andreas Hotho

doi:10.18653/v1/2024.semeval-1.219

Pollice Verso at SemEval-2024 Task 6: The Roman Empire Strikes Back

Konstantin Kobs, Jan Pfister, Andreas Hotho

Abstract

We present an intuitive approach for hallucination detection in LLM outputs that is modeled after how humans would go about this task. We engage several LLM “experts” to independently assess whether a response is hallucinated. For this we select recent and popular LLMs smaller than 7B parameters. By analyzing the log probabilities for tokens that signal a positive or negative judgment, we can determine the likelihood of hallucination. Additionally, we enhance the performance of our “experts” by automatically refining their prompts using the recently introduced OPRO framework. Furthermore, we ensemble the replies of the different experts in a uniform or weighted manner, which builds a quorum from the expert replies. Overall this leads to accuracy improvements of up to 10.6 p.p. compared to the challenge baseline. We show that a Zephyr 3B model is well suited for the task. Our approach can be applied in the model-agnostic and model-aware subtasks without modification and is flexible and easily extendable to related tasks.

Anthology ID:: 2024.semeval-1.219
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1529–1536
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2024.semeval-1.219/
DOI:: 10.18653/v1/2024.semeval-1.219
Bibkey:
Cite (ACL):: Konstantin Kobs, Jan Pfister, and Andreas Hotho. 2024. Pollice Verso at SemEval-2024 Task 6: The Roman Empire Strikes Back. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1529–1536, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Pollice Verso at SemEval-2024 Task 6: The Roman Empire Strikes Back (Kobs et al., SemEval 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2024.semeval-1.219.pdf
Supplementarymaterial:: 2024.semeval-1.219.SupplementaryMaterial.txt
Supplementarymaterial:: 2024.semeval-1.219.SupplementaryMaterial.zip
Video:: https://preview.aclanthology.org/fix-sig-urls/2024.semeval-1.219.mp4

PDF Cite Search Supplementarymaterial Supplementarymaterial Video Fix data