Kuan Eeik Tan

Also published as: Kuan Eeik Tan


2025

pdf bib
FiRC-NLP at SemEval-2025 Task 3: Exploring Prompting Approaches for Detecting Hallucinations in LLMs
Wondimagegnhue Tufa | Fadi Hassan | Guillem Collell | Dandan Tu | Yi Tu | Sang Ni | Kuan Eeik Tan
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents a system description forthe SemEval Mu-SHROOM task, focusing ondetecting hallucination spans in the outputsof instruction-tuned Large Language Models(LLMs) across 14 languages. We comparetwo distinct approaches: Prompt-Based Ap-proach (PBA), which leverages the capabilityof LLMs to detect hallucination spans usingdifferent prompting strategies, and the Fine-Tuning-Based Approach (FBA), which fine-tunes pre-trained Language Models (LMs) toextract hallucination spans in a supervised man-ner. Our experiments reveal that PBA, espe-cially when incorporating explicit references orexternal knowledge, outperforms FBA. How-ever, the effectiveness of PBA varies across lan-guages, likely due to differences in languagerepresentation within LLMs

2022

pdf bib
SeqL at SemEval-2022 Task 11: An Ensemble of Transformer Based Models for Complex Named Entity Recognition Task
Fadi Hassan | Wondimagegnhue Tufa | Guillem Collell | Piek Vossen | Lisa Beinborn | Adrian Flanagan | Kuan Eeik Tan
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents our system used to participate in task 11 (MultiCONER) of the SemEval 2022 competition. Our system ranked fourth place in track 12 (Multilingual) and fifth place in track 13 (Code-Mixed). The goal of track 12 is to detect complex named entities in a multilingual setting, while track 13 is dedicated to detecting complex named entities in a code-mixed setting. Both systems were developed using transformer-based language models. We used an ensemble of XLM-RoBERTa-large and Microsoft/infoxlm-large with a Conditional Random Field (CRF) layer. In addition, we describe the algorithms employed to train our models and our hyper-parameter selection. We furthermore study the impact of different methods to aggregate the outputs of the individual models that compose our ensemble. Finally, we present an extensive analysis of the results and errors.