Varun Ojha


2026

This task introduces Abductive Event Reasoning (AER), a novel shared task, to investigate the ability of Large Language Models(LLMs) to reason about the causality of real-world events. More specifically, a data set consisting of different topics and choices is introduced, and we need to enable the model to select the best options for the given event. Three methods are separately introduced to explore thequestion, including the traditional natural language processing(NLP) method (DeBERTa), theenhanced knowledge graph(KG), and the KG embedded in hyperbolic space.

2024

SemEval-2024 Task 8 introduces the challenge of identifying machine-generated texts from diverse Large Language Models (LLMs) in various languages and domains. The task comprises three subtasks: binary classification in monolingual and multilingual (Subtask A), multi-class classification (Subtask B), and mixed text detection (Subtask C). This paper focuses on Subtask A & B. To tackle this task, this paper proposes two methods: 1) using traditional machine learning (ML) with natural language preprocessing (NLP) for feature extraction, and 2) fine-tuning LLMs for text classification. For fine-tuning, we use the train datasets provided by the task organizers. The results show that transformer models like LoRA-RoBERTa and XLM-RoBERTa outperform traditional ML models, particularly in multilingual subtasks. However, traditional ML models performed better than transformer models for the monolingual task, demonstrating the importance of considering the specific characteristics of each subtask when selecting an appropriate approach.

2023

In SemEval-2023 Task 1, a task of applying Word Sense Disambiguation in an image retrieval system was introduced. To resolve this task, this work proposes three approaches: (1) an unsupervised approach considering similarities between word senses and image captions, (2) a supervised approach using a Siamese neural network, and (3) a self-supervised approach using a Bayesian personalized ranking framework. According to the results, both supervised and self-supervised approaches outperformed the unsupervised approach. They can effectively identify correct images of ambiguous words in the dataset provided in this task.