Sujal Maharjan


2026

The digital public square is increasingly fragmented by affective polarization, requiring computational systems capable of identifying discursive strategies such as dehumanization and vilification. This paper presents Sagarmatha, the system developed for SemEval-2026 Task 9. We propose a heterogeneous ensemble architecture that addresses the limitations of standard transformer fine-tuning across 22 languages. Our approach integrates mDeBERTa-v3, ReMBERT, LaBSE, mmBERT, and XLM-RoBERTa, through two primary architectural pillars: learnable weighted layer pooling and hierarchical task conditioning. While our final submission (a broad ensemble, R3) demonstrated high stability on the leaderboard, our primary architectural configuration (Weighted Polyglot, R1) yielded superior performance in complex multi-label tasks. The system ranked 1st globally in English and Hausa manifestation identification, and 1st in Telugu detection (2nd in categorization). All code and resources are available at https://github.com/SUJAL390/SagarmathaatSemevaltask9.git.
Research on Event Extraction (EE) in South Asian languages is crucial for understanding information dissemination and enabling automated news analysis in morphologically complex, low-resource environments. To address the scarcity of high-quality, publicly available datasets, we present Nepali Event Extraction (NepEE), a manually annotated corpus comprising 10,226 Devanagari sentences. The dataset includes annotations for trigger spans and event types, achieving high inter-annotator agreement with Fleiss’ kappa = 0.812 for trigger identification and kappa = 0.855 for event classification. Our dataset was developed through a rigorous iterative three-phase protocol involving five expert native speakers to ensure linguistic precision. We conduct benchmarking across a broad spectrum of approaches, including classical feature-based models, five fine-tuned Transformer encoders, and contemporary instruction-tuned Large Language Models (LLMs) using zero-shot and fixed few-shot prompting. Our analysis shows that Indic-specialized Transformers achieve superior classification performance, while traditional methods and few-shot prompting struggle with the challenges of exact span extraction in morphologically complex contexts. Furthermore, we quantify performance differences between sentence-level and span-level tasks, providing strong baselines for future research. The findings and the released NepEE dataset provide a valuable resource for advancing event understanding in low-resource languages (LRLs). All code and resources are available at https://github.com/SUJAL390/EEUCA-ACL-2026-Trigger-Phrase-Identification-and-Event-Classification-in-Low-Resource-Languages.

2025

This paper presents rankedCOMET, a lightweight per-language-pair calibration applied to the publicly available Unbabel/wmt22-comet-da model that yields a competitive Quality Estimation (QE) system for the WMT 2025 shared task. This approach transforms raw model outputs into per-language average ranks and min–max normalizes those ranks to [0,1], maintaining intra-language ordering while generating consistent numeric ranges across language pairs. Applied to 742,740 test segments and submitted to Codabench, this unsupervised post-processing enhanced the aggregated Pearson correlation on the preliminary snapshot and led to a 5th-place finish. We provide detailed pseudocode, ablations (including a negative ensemble attempt), and a reproducible analysis pipeline providing Pearson, Spearman, and Kendall correlations with bootstrap confidence intervals.
This paper describes our submissions to the TSAR 2025 Shared Task on Readability-Controlled Text Simplification. We present a comparative study of three architectures a rule-based baseline, a heuristic-driven expert system, and a zero-shot generative T5 pipeline with a semantic guardrail. Our analysis shows a trade-off between the controllability of rule-based systems and the fluency of generative models. In this zero-shot setting, simpler, confined systems achieved superior meaning preservation scores compared to the more powerful but less predictable generative model. We present a diagnostic failure analysis on system outputs, illustrating how different architectures result in distinct error patterns such as under-simplification, information loss via heuristics, and semantic drift.
The multimodal ambiguity of text-embedded images (memes), particularly those pertaining to marginalized communities, presents a significant challenge for natural language and vision processing. The subtle interaction between text, image, and cultural context makes it challenging to develop robust moderation tools. This paper tackles this challenge across four key tasks: (A) Hate Speech Detection, (B) Hate Target Classification, (C) Topical Stance Classification, and (D) Intended Humor Detection. We demonstrate that the nuances of these tasks demand a departure from a ‘onesize-fits-all’ approach. Our central contribution is a task-specific methodology, where we align model architecture with the specific challenges of each task, all built upon a common CLIP-ViT backbone. Our results illustrate the strong performance of this task-specific approach, with multiple architectures excelling at each task. For Hate Speech Detection (Task A), the Co-Attention Ensemble model achieved a top F1-score of 0.7929; for Hate Target Classification (Task B), our Hierarchical CrossAttention Transformer achieved an F1-score of 0.5777; and for Stance (Task C) and Humor Detection (Task D), our Two-Stage Multiplicative Fusion Framework yielded leading F1-scores of 0.6070 and 0.7529, respectively. Beyond raw results, we also provide detailed error analyses, including confusion matrices, to reveal weaknesses driven by multimodal ambiguity and class imbalance. Ultimately, this work provides a blueprint for the community, establishing that optimal performance in multimodal analysis is achieved not by a single superior model, but through the customized design of specialized solutions, supported by empirical validation of key methodological choices.