FunghiFunghi at SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

Tariq Ballout; Pieter Jansma; Nander Koops; Yong Hui Zhou

FunghiFunghi at SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

Tariq Ballout, Pieter Jansma, Nander Koops, Yong Hui Zhou

Abstract

Large Language Models (LLMs) often generate hallucinated content, which is factually incorrect or misleading, posing reliability challenges. The Mu-SHROOM shared task addresses hallucination detection in multilingualLLM-generated text. This study employsSpanBERT, a transformer model optimized forspan-based predictions, to identify hallucinatedspans across multiple languages. To addresslimited training data, we apply dataset augmentation through translation and synthetic generation. The model is evaluated using Intersection over Union (IoU) for span detectionand Spearman’s correlation for ranking consistency. While the model detects hallucinatedspans with moderate accuracy, it struggles withranking confidence scores. These findings highlight the need for improved probability calibration and multilingual robustness. Future workshould refine ranking methods and explore ensemble models for better performance.

Anthology ID:: 2025.semeval-1.211
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1602–1608
Language:
URL:: https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.211/
DOI:
Bibkey:
Cite (ACL):: Tariq Ballout, Pieter Jansma, Nander Koops, and Yong Hui Zhou. 2025. FunghiFunghi at SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 1602–1608, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: FunghiFunghi at SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes (Ballout et al., SemEval 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.211.pdf

PDF Cite Search Fix data