Yaru Zhao
2026
uir-cis at SemEval-2026 Task 12: Mitigating Prior-Induced Hallucinations in Retrieval-Augmented Reasoning via Precision-Oriented Decoding
Chiyao Zhou | Zebing Wang | Kexin Deng | Yaru Zhao | Lin Deng | Binyang Li
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Chiyao Zhou | Zebing Wang | Kexin Deng | Yaru Zhao | Lin Deng | Binyang Li
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our system for the SemEval-2026 Task 12 on Abductive Event Reasoning (AER). We systematically address the "over-selection" hallucination pathology in Instruction-tuned Large Language Models (LLMs), where models erroneously align distractors with semantic priors rather than retrieved evidence. Our framework utilizes a 32-billion parameter Qwen2.5 foundational model adapted via Low-Rank Adaptation (LoRA) and evaluated under a Zero-shot Chain-of-Thought (CoT) setting. To mitigate epistemic noise, we propose a Precision-Oriented Decoding (POD) strategy that couples low-temperature sampling (T=0.45) with scaled majority voting (K=9). Following a three-stage empirical evolution—from baseline diagnosis to precision optimization and ensemble analysis—our system achieved a score of 0.802 on the official test set. Our findings demonstrate that in causal reasoning tasks with strict penalization for incorrect predictions, epistemic noise suppression is strictly superior to heuristic recall compensation.
2025
uir-cis at SemEval-2025 Task 3: Detection of Hallucinations in Generated Text
Jia Huang | Shuli Zhao | Yaru Zhao | Tao Chen | Weijia Zhao | Hangui Lin | Yiyang Chen | Binyang Li
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Jia Huang | Shuli Zhao | Yaru Zhao | Tao Chen | Weijia Zhao | Hangui Lin | Yiyang Chen | Binyang Li
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
The widespread deployment of large language models (LLMs) across diverse domains has underscored the critical need to ensure the credibility and accuracy of their generated content, particularly in the presence of hallucinations. These hallucinations can severely compromise both the practical performance of models and the security of their applications. In response to this issue, SemEval-2025 Task 3 Mu-SHROOM: Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes introduces a more granular task for hallucination detection. This task seeks to identify hallucinations in text, accurately locate hallucinated segments, and assess their credibility. In this paper, we present a three-stage method for fine-grained hallucination detection and localization. First, we transform the text into a triplet representation, facilitating more precise hallucination analysis. Next, we leverage a large language model to generate fact-reference texts that correspond to the triplets. Finally, we employ a fact alignment strategy to identify and localize hallucinated segments by evaluating the semantic consistency between the extracted triplets and the generated reference texts. We evaluate our method on the unlabelled test set across all languages in Task 3, demonstrating strong detection performance and validating its effectiveness in multilingual contexts.