Sujal Maharjan

2026

Sagarmatha at SemEval-2026 Task 9: Heterogeneous Ensembling and Hierarchical Task Conditioning for Multilingual Latent Distributional Divergence Modeling
Sujal Maharjan | Astha Shrestha | Pratikshya Shrestha
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

The digital public square is increasingly fragmented by affective polarization, requiring computational systems capable of identifying discursive strategies such as dehumanization and vilification. This paper presents Sagarmatha, the system developed for SemEval-2026 Task 9. We propose a heterogeneous ensemble architecture that addresses the limitations of standard transformer fine-tuning across 22 languages. Our approach integrates mDeBERTa-v3, ReMBERT, LaBSE, mmBERT, and XLM-RoBERTa, through two primary architectural pillars: learnable weighted layer pooling and hierarchical task conditioning. While our final submission (a broad ensemble, R3) demonstrated high stability on the leaderboard, our primary architectural configuration (Weighted Polyglot, R1) yielded superior performance in complex multi-label tasks. The system ranked 1st globally in English and Hausa manifestation identification, and 1st in Telugu detection (2nd in categorization). All code and resources are available at https://github.com/SUJAL390/SagarmathaatSemevaltask9.git.

pdf bib abs

Benchmarking Models for Low-Resource Nepali Event Extraction with Trigger Phrase Identification and Event Classification
Sujal Maharjan | Astha Shrestha | Lakshmojee Koduru | Sweta Poudel | Shuvam Shiwakoti | Rabin Thapa | Kritesh Rauniyar | Surendrabikram Thapa
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)

Research on Event Extraction (EE) in South Asian languages is crucial for understanding information dissemination and enabling automated news analysis in morphologically complex, low-resource environments. To address the scarcity of high-quality, publicly available datasets, we present Nepali Event Extraction (NepEE), a manually annotated corpus comprising 10,226 Devanagari sentences. The dataset includes annotations for trigger spans and event types, achieving high inter-annotator agreement with Fleiss’ kappa = 0.812 for trigger identification and kappa = 0.855 for event classification. Our dataset was developed through a rigorous iterative three-phase protocol involving five expert native speakers to ensure linguistic precision. We conduct benchmarking across a broad spectrum of approaches, including classical feature-based models, five fine-tuned Transformer encoders, and contemporary instruction-tuned Large Language Models (LLMs) using zero-shot and fixed few-shot prompting. Our analysis shows that Indic-specialized Transformers achieve superior classification performance, while traditional methods and few-shot prompting struggle with the challenges of exact span extraction in morphologically complex contexts. Furthermore, we quantify performance differences between sentence-level and span-level tasks, providing strong baselines for future research. The findings and the released NepEE dataset provide a valuable resource for advancing event understanding in low-resource languages (LRLs). All code and resources are available at https://github.com/SUJAL390/EEUCA-ACL-2026-Trigger-Phrase-Identification-and-Event-Classification-in-Low-Resource-Languages.

2025

pdf bib abs

RankedCOMET: Elevating a 2022 Baseline to a Top-5 Finish in the WMT 2025 QE Task
Sujal Maharjan | Astha Shrestha
Proceedings of the Tenth Conference on Machine Translation

This paper presents rankedCOMET, a lightweight per-language-pair calibration applied to the publicly available Unbabel/wmt22-comet-da model that yields a competitive Quality Estimation (QE) system for the WMT 2025 shared task. This approach transforms raw model outputs into per-language average ranks and min–max normalizes those ranks to [0,1], maintaining intra-language ordering while generating consistent numeric ranges across language pairs. Applied to 742,740 test segments and submitted to Codabench, this unsupervised post-processing enhanced the aggregated Pearson correlation on the preliminary snapshot and led to a 5th-place finish. We provide detailed pseudocode, ablations (including a negative ensemble attempt), and a reproducible analysis pipeline providing Pearson, Spearman, and Kendall correlations with bootstrap confidence intervals.

pdf bib abs

HOPE at TSAR 2025 Shared Task Balancing Control and Complexity in Readability-Controlled Text Simplification
Sujal Maharjan | Astha Shrestha
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

This paper describes our submissions to the TSAR 2025 Shared Task on Readability-Controlled Text Simplification. We present a comparative study of three architectures a rule-based baseline, a heuristic-driven expert system, and a zero-shot generative T5 pipeline with a semantic guardrail. Our analysis shows a trade-off between the controllability of rule-based systems and the fluency of generative models. In this zero-shot setting, simpler, confined systems achieved superior meaning preservation scores compared to the more powerful but less predictable generative model. We present a diagnostic failure analysis on system outputs, illustrating how different architectures result in distinct error patterns such as under-simplification, information loss via heuristics, and semantic drift.

pdf bib abs

Multimodal Kathmandu@CASE 2025: Task-Specific Adaptation of Multimodal Transformers for Hate, Stance, and Humor Detection
Sujal Maharjan | Astha Shrestha | Shuvam Thakur | Rabin Thapa
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

The multimodal ambiguity of text-embedded images (memes), particularly those pertaining to marginalized communities, presents a significant challenge for natural language and vision processing. The subtle interaction between text, image, and cultural context makes it challenging to develop robust moderation tools. This paper tackles this challenge across four key tasks: (A) Hate Speech Detection, (B) Hate Target Classification, (C) Topical Stance Classification, and (D) Intended Humor Detection. We demonstrate that the nuances of these tasks demand a departure from a ‘onesize-fits-all’ approach. Our central contribution is a task-specific methodology, where we align model architecture with the specific challenges of each task, all built upon a common CLIP-ViT backbone. Our results illustrate the strong performance of this task-specific approach, with multiple architectures excelling at each task. For Hate Speech Detection (Task A), the Co-Attention Ensemble model achieved a top F1-score of 0.7929; for Hate Target Classification (Task B), our Hierarchical CrossAttention Transformer achieved an F1-score of 0.5777; and for Stance (Task C) and Humor Detection (Task D), our Two-Stage Multiplicative Fusion Framework yielded leading F1-scores of 0.6070 and 0.7529, respectively. Beyond raw results, we also provide detailed error analyses, including confusion matrices, to reveal weaknesses driven by multimodal ambiguity and class imbalance. Ultimately, this work provides a blueprint for the community, establishing that optimal performance in multimodal analysis is achieved not by a single superior model, but through the customized design of specialized solutions, supported by empirical validation of key methodological choices.

Co-authors

Shuvam Shiwakoti 1

Pratikshya Shrestha 1

Shuvam Thakur 1

Surendrabikram Thapa 1

Venues

WMT1

Fix author