Soumyajit Roy
2026
Ambirig at SemEval-2026 Task 5: Distributional Ordinal Modelling for Ambiguous Word Senses in Narrative Contexts
Soumyajit Roy
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Soumyajit Roy
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Word Sense Disambiguation (WSD) has traditionally been framed as selecting a single correct sense given context. However, natural language understanding by humans often involves ambiguity, underspecification, and graded plausibility judgments rather than categorical decisions. SemEval-2026 Task 5 explicitly targets this gap by requiring systems to predict human-perceived plausibility scores for word senses in short narratives. In this paper, we present a systematic empirical study of modelling plausibility as an ordinal distribution prediction problem. We hypothesise that standard classification objectives fail to capture the ordinal nature of human uncertainty in this domain. While we experimented with complex auxiliary tasks, including Siamese networks, Task-Adaptive Pretraining (TAPT), and transfer learning from Natural Language Inference (NLI), our results show these approaches fail in low-resource settings. Instead, we propose a streamlined architecture based on DeBERTa-v3-base utilising a GlossBERT-style Cross-Encoder optimised with Earth Mover’s Distance (EMD) loss. By modeling the problem as ordinal regression over a probability distribution and enriching inputs with prototypical examples, our system achieves an accuracy of 73% and Spearman correlation of 0.593, establishing a robust baseline that outperforms complex parameter-heavy approaches.
ModusPonens at SemEval-2026 Task 11: Breaking the Plausibility Trap in LLMs via Conflict-Aware Ensembling
Soumyajit Roy | Manav Malhotra
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Soumyajit Roy | Manav Malhotra
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Large Language Models (LLMs) often struggle to disentangle formal logical validity from real-world plausibility, a phenomenon known as the "belief bias". This paper describes our submission to SemEval-2026 Task 11. We frame the task as a calibration problem between "System 1" (heuristic) and "System 2" (logical) thinking. Our experiments reveal that standard neuro-symbolic interventions, such as Structural Chain-of-Thought (CoT) and Nonsense Augmentation, degrade performance in low-resource regimes due to an "abstraction penalty". Instead, we propose a Conflict-Aware Logit Ensemble. We fine-tune two variations of Qwen-2.5-14B: a standard "Believer" model and a bias-hardened "Skeptic" model trained on oversampled conflict data. By ensembling their logits via soft-voting, we achieve a Pareto-optimal balance, reducing the Total Content Effect (TCE) to 3.21 while maintaining an overall accuracy of 94.27%, resulting in a Combined Score of 39.09.
2025
CodeAnubad at BLP-2025 Task 2: Efficient Bangla-to-Python Code Generation via Iterative LoRA Fine-Tuning of Gemma-2
Soumyajit Roy
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
Soumyajit Roy
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
This paper presents our submission for Task 2 of the Bangla Language Processing (BLP) Workshop, which focuses on generating Python code from Bangla programming prompts in a low-resource setting. We address this challenge by fine-tuning the gemma-2-9b instruction-tuned model using parameter-efficient fine-tuning (PEFT) with QLoRA. We propose an iterative self-improvement strategy that augments the extremely limited training data (74 examples) by reusing verified correct predictions from the development set, alongside LoRA rank experiments (8, 16, 32), observing a clear correlation between rank and accuracy, with rank 32 delivering the best results. Compared to translation-based and retrieval-augmented baselines, our approach achieves significantly higher accuracy, with a pass rate of 47% on the development set and 37% on the hidden test set. These results highlight the effectiveness of combining iterative data augmentation with rank optimisation for specialised, low-resource code generation tasks.