UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection

Michelle Wastl, Jannis Vamvas, Rico Sennrich


Abstract
This paper presents our system developed for the SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. The objective of this task is to identify spans of hallucinated text in the output of large language models across 14 high- and low- resource languages. To address this challenge, we propose two consistency-based approaches: (a) token-level consistency with a superior LLM and (b) token-level self-consistency with the underlying model of the sequence that is to be evaluated. Our results show effectiveness when compared to simple mark-all baselines, competitiveness to other submissions of the shared task and for some languages to GPT4o- mini prompt-based approaches.
Anthology ID:
2025.semeval-1.38
Volume:
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
257–270
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.38/
DOI:
Bibkey:
Cite (ACL):
Michelle Wastl, Jannis Vamvas, and Rico Sennrich. 2025. UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 257–270, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection (Wastl et al., SemEval 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.38.pdf