Soumyajit Roy


2026

Word Sense Disambiguation (WSD) has traditionally been framed as selecting a single correct sense given context. However, natural language understanding by humans often involves ambiguity, underspecification, and graded plausibility judgments rather than categorical decisions. SemEval-2026 Task 5 explicitly targets this gap by requiring systems to predict human-perceived plausibility scores for word senses in short narratives. In this paper, we present a systematic empirical study of modelling plausibility as an ordinal distribution prediction problem. We hypothesise that standard classification objectives fail to capture the ordinal nature of human uncertainty in this domain. While we experimented with complex auxiliary tasks, including Siamese networks, Task-Adaptive Pretraining (TAPT), and transfer learning from Natural Language Inference (NLI), our results show these approaches fail in low-resource settings. Instead, we propose a streamlined architecture based on DeBERTa-v3-base utilising a GlossBERT-style Cross-Encoder optimised with Earth Mover’s Distance (EMD) loss. By modeling the problem as ordinal regression over a probability distribution and enriching inputs with prototypical examples, our system achieves an accuracy of 73% and Spearman correlation of 0.593, establishing a robust baseline that outperforms complex parameter-heavy approaches.
Large Language Models (LLMs) often struggle to disentangle formal logical validity from real-world plausibility, a phenomenon known as the "belief bias". This paper describes our submission to SemEval-2026 Task 11. We frame the task as a calibration problem between "System 1" (heuristic) and "System 2" (logical) thinking. Our experiments reveal that standard neuro-symbolic interventions, such as Structural Chain-of-Thought (CoT) and Nonsense Augmentation, degrade performance in low-resource regimes due to an "abstraction penalty". Instead, we propose a Conflict-Aware Logit Ensemble. We fine-tune two variations of Qwen-2.5-14B: a standard "Believer" model and a bias-hardened "Skeptic" model trained on oversampled conflict data. By ensembling their logits via soft-voting, we achieve a Pareto-optimal balance, reducing the Total Content Effect (TCE) to 3.21 while maintaining an overall accuracy of 94.27%, resulting in a Combined Score of 39.09.

2025

This paper presents our submission for Task 2 of the Bangla Language Processing (BLP) Workshop, which focuses on generating Python code from Bangla programming prompts in a low-resource setting. We address this challenge by fine-tuning the gemma-2-9b instruction-tuned model using parameter-efficient fine-tuning (PEFT) with QLoRA. We propose an iterative self-improvement strategy that augments the extremely limited training data (74 examples) by reusing verified correct predictions from the development set, alongside LoRA rank experiments (8, 16, 32), observing a clear correlation between rank and accuracy, with rank 32 delivering the best results. Compared to translation-based and retrieval-augmented baselines, our approach achieves significantly higher accuracy, with a pass rate of 47% on the development set and 37% on the hidden test set. These results highlight the effectiveness of combining iterative data augmentation with rank optimisation for specialised, low-resource code generation tasks.