Jianfei Xu
2026
NCL HKU-NarrSim at SemEval-2026 Task 4: Aspect-Based Agents and Supervised Contrastive Embeddings for Narrative Similarity
Jianfei Xu | Ting Zhu | Mingyang Chen | Huizhi(elly) Liang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Jianfei Xu | Ting Zhu | Mingyang Chen | Huizhi(elly) Liang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
SemEval-2026 Task 4 on Narrative Similarity requires models to assess narrative alignment between stories rather than relying on surface lexical similarity. For Track A, we introduce the Aspect-Based Narrative Similarity Agents(ABNS-Agents), a two-stage agent-based framework. It extracts three core narrative aspects aligned with the task definition under a schema constraint, and then performs aspect-aligned similarity adjudication using an LLM decision model. For Track B, Narrative Supervised Contrastive Embeddings(NSConE) is based upon supervised contrastive learning to model narrative similarity. Our experiments show that ABNS-Agents achieves 70.25% accuracy on the test set, while NSConE reaches 68.5% test accuracy, demonstrating competitive performance across both reasoning-based and representation-learning paradigms. The findings highlight the effectiveness of aspect-aligned structured modelling and task-specific supervised contrastive learning for capturing narrative similarity beyond surface semantics.
2025
NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SelfCheckGPT
Jiaying Hong | Thanet Markchom | Jianfei Xu | Tong Wu | Huizhi Liang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Jiaying Hong | Thanet Markchom | Jianfei Xu | Tong Wu | Huizhi Liang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
SemEval-2025 Task 3 (Mu-SHROOM) focuses on detecting hallucinations in content generated by various large language models (LLMs) across multiple languages. This task involves not only identifying the presence of hallucinations but also pinpointing their specific occurrences. To tackle this challenge, this study introduces two methods: Modified-RefChecker (MRC) and Modified-SelfCheckGPT-H (MSCGH). MRC integrates prompt-based factual verification into References, structuring them as claim-based tests rather than single external knowledge sources. MSCGH incorporates external knowledge to overcome its reliance on internal knowledge. In addition, both methods’ original prompt designs are enhanced to identify hallucinated words within LLM-generated texts. Experimental results demonstrate the effectiveness of the approach, achieving a high ranking on the test dataset in detecting hallucinations across various languages, with an average IoU of 0.5310 and an average COR of 0.5669. The source code used in this paper is available at https://github.com/jianfeixu95/NCL-UoR.