Saurav Aryal


2025

pdf bib
Howard University-AI4PC at SemEval-2025 Task 10: Ensembling LLMs for Multi-lingual Multi-Label and Multi-Class Meta-Classification
Saurav Aryal | Prasun Dhungana
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes our approach and submission to the SemEval 2025 shared task on “Multilingual Characterization and Extraction of Narratives from Online News”. The purpose of this task was to assign primary and fine-grained roles to named entities in news articles from five different languages, on the topics of Climate Change and Ukraine-Russia War. In this paper, we explain how we approached the task by utilizing multiple LLMs via Prompt Engineering and combining their results into a final task result through an ensemble meta-classification technique. Our experimental results demonstrate that this integrated approach outperforms the provided baseline in detecting bias, deception, and manipulation in news media across multiple languages.

pdf bib
Howard University-AI4PC at SemEval-2025 Task 11: Combining Expert Personas via Prompting for Enhanced Multilingual Emotion Analysis
Amir Ince | Saurav Aryal
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

For our approach to SemEval-2025 Task 11, we employ a multi-tier evaluation framework for perceived emotion analysis. Our system consists of a smaller-parameter-size large language model that independently predicts a given text’s perceived emotion while explaining the reasoning behind its decision. The initial model’s persona is varied through careful prompting, allowing it to represent multiple perspectives. These outputs, including both predictions and reasoning, are aggregated and fed into a final decision-making model that determines the ultimate emotion classification. We evaluated our approach in official SemEval Task 11 on subtasks A and C in all the languages provided.

pdf bib
Howard University-AI4PC at SemEval-2025 Task 8: DeepTabCoder - Code-based Retrieval and In-context Learning for Question-Answering over Tabular Data
Saharsha Tiwari | Saurav Aryal
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents our approach, named DeepTabCoder, to SemEval 2025 - Task 8: DataBench, which focuses on question-answering over tabular data. We utilize a code-based retrieval system combined with in-context learning, which generates and executes code to answer questions, leveraging DeepSeek-V3 for code generation. DeepTabCoder outperforms the baseline, achieving accuracies of 81.42% on the DataBench dataset and 80.46% on the DataBench Lite dataset.

pdf bib
Howard University-AI4PC at SemEval-2025 Task 4: Unlearning Sensitive Content From Large Language Models Using Finetuning and Distillation for Selective Knowledge Removal
Aayush Acharya | Saurav Aryal
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents our approach and submission to the SemEval 2025 task on “Unlearning Sensitive Content from Large Language Models.” The task focuses on making LLMs forget specific knowledge, such as copyrighted material and personally identifiable information (PII), without needing expensive retraining from scratch on the OLMo model. We propose a method to unlearn using fine-tuning and knowledge distillation. Our approach involves fine-tuning separate models on “retain” and “forget” datasets to preserve or suppress knowledge selectively. We then distill the model by suppressing logarithmic data from the fine-tuned model without learning using a combined loss of L2, KL divergence and cosine similarity while retaining knowledge from the fine-tuned model with retention using KL divergence loss.

pdf bib
Howard University-AI4PC at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval-Combining Zero-Shot Claim Extraction and KNN-Based Classification for Multilingual Claim Matching
Suprabhat Rijal | Saurav Aryal
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

SemEval Task 7 introduced a dataset for multilingual and cross-lingual fact checking. We propose a system that leverages similarity matching, KNN, zero-shot classification and summarization to retrieve fact-checks for social media posts across multiple languages. Our approach achieves performance within the expected range, aligning with baseline results. Although competitive, the findings highlight the potential and challenges of zero-shot methods, providing a foundation for future research in cross-lingual information verification.

pdf bib
Howard University - AI4PC at SemEval-2025 Task 3: Logit-based Supervised Token Classification for Multilingual Hallucination Span Identification Using XGBOD
Saurav Aryal | Mildness Akomoize
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes our system for SemEval-2025 Task 3, Mu-SHROOM, which focuses on detecting hallucination spans in multilingual LLM outputs. We reframe hallucination detection as a point-wise anomaly detection problem by treating logits as time-series data. Our approach extracts features from token-level logits, addresses class imbalance with SMOTE, and trains an XGBOD model for probabilistic character-level predictions. Our system, which relies solely on information derived from the logits and token offsets (using pretrained tokenizers), achieves competitive intersection-over-union (IoU) and correlation scores on the validation and test set.

pdf bib
Howard University-AI4PC at SemEval-2025 Task 1: Using GPT-4o and CLIP-ViLT to Decode Figurative Language Across Text and Images
Saurav Aryal | Lawal Abdulmujeeb
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Correctly identifying idiomatic expressions remains a major challenge in Natural Language Processing (NLP), as these expressions often have meanings that cannot be directly inferred from their individual words. The SemEval-2025 Task 1 introduces two subtasks, A and B, designed to test models’ ability to interpret idioms using multimodal data, including both text and images. This paper focuses on Subtask A, where the goal is to determine which among several images best represents the intended meaning of an idiomatic expression in a given sentence.To address this, we employed a two-stage approach. First, we used GPT-4o to analyze sentences, extracting relevant keywords and sentiments to better understand the idiomatic usage. This processed information was then passed to a CLIP-VIT model, which ranked the available images based on their relevance to the idiomatic expression. Our results showed that this approach performed significantly better than directly feeding sentences and idiomatic compounds into the models without preprocessing. Specifically, our method achieved a Top-1 accuracy of 0.67 in English, whereas performance in Portuguese was notably lower at 0.23. These findings highlight both the promise of multimodal approaches for idiom interpretation and the challenges posed by language-specific differences in model performance.

pdf bib
Howard University-AI4PC at SemEval-2025 Task 2: Improving Machine Translation With Context-Aware Entity-Only Pre-translations with GPT4o
Saurav Aryal | Jabez Agyemang - Prempeh
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents our work on a 3-Step GPT translation system developed for SemEval-2025 Task 2 to enhance the translation of named entities within machine translation. Our approach integrates (1) entity extraction via wikidata, (2) GPT-based refinement of entity translations, and (3) final context-aware GPT translation. Results from the original dataset of six languages show significant improvements in the handling of named entities compared to direct GPT-based translation baselines. We further discuss replicability, observed challenges, and outline future research directions.

pdf bib
Howard University-AI4PC at SemEval-2025 Task 9: Using Open-weight BART-MNLI for Zero Shot Classification of Food Recall Documents
Saurav Aryal | Kritika Pant
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

We present our system for SemEval-2025 Task 9: Food Hazard Detection, a shared task focused on the explainable classification of food-incident reports. The task involves predicting hazard and product categories (ST1) and their exact vectors (ST2) from short texts. Our approach leverages zero-shot classification using the BART-large-MNLI model, which allows classification without task-specific fine-tuning. Our model achieves competitive performance, emphasizing hazard prediction accuracy, as evaluated by the macro-F1 score.

2023

pdf bib
Howard University Computer Science at SemEval-2023 Task 12: A 2-Step System Design for Multilingual Sentiment Classification with Language Identification
Saurav Aryal | Howard Prioleau
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The recent release of the AfriSenti-SemEval shared Task 12 has made available 14 new datasets annotated for sentiment analysis on African Languages. We proposed and evaluated two approaches to this task, Delta TF-IDF, and a proposed Language-Specific Model Fusion Algorithm using Language Identification, both of which produced comparable or better classification performance than the current state-of-art models on this task: AfriBERTa, AfroXLMR, and AfroLM.