Dhruv Goyal


2026

We present Team Paradise’s systems for three tasks in the SMM4H-HeaRD 2026 shared task: multilingual adverse drug event detection (Task 1), influenza vaccine effectiveness estimation via two-subtask classification (Task 3), and opioid impact span extraction (Task 7). For Task 1, threshold-only ablation on XLMRoBERTa-large achieves a macro-F1 of 0.597, exceeding the field mean (0.547) by +0.050. For Task 3, a three-stage hybrid pipeline combining twitter-RoBERTa-base-2022 with rule-based post-processing achieves Micro-F1 0.8434 (Subtask 1: vaccination status) and 0.8936 (Subtask 2: test results). For Task 7, RoBERTa-large with CRF decoding and sliding-window inference obtains relaxed F1 0.60 despite severe train-test distributional shift Across tasks, we identify class imbalance, temporal ambiguity, and platform heterogeneity as central challenges.
Neuro-symbolic Basis for Robust Syllogistic Reasoning Under Distractors.We present our submission to SemEval-2026 Task 11 Subtasks 2 and 4, on syllogistic premise retrieval with distractors. Our system is based on a robustness-first neuro-symbolic pipeline. The key innovation is single-call joint abstraction: rather than parsing all statements independently, one LLM call jointly abstracts all premises and the conclusion into categorical logical forms (A/E/I/O) where symbolic (X/Y/Z) mappings are globally consistent. This allows reliable detection of the shared middle term needed for syllogistic validation. Parsed forms are passed through an exhaustive O(n²) premise-pair search with deterministic validation against the 24 valid Aristotelian syllogistic forms via constant time lookup. Ablation studies show that more theoretically sophisticated variants degrade performance when logical-form extraction is the primary bottleneck. Our approach achieves competitive rankings in both English and multilingual settings while remaining simple, deterministic, and content-invariant.
Team 0704mis addressed the SemEval-2026 Task 11 Subtask 3 by building a neuro-symbolic system designed for multilingual syllogistic validity classification across 12 typologically diverse languages. The process involves a neural parser that extracts logical forms from text, which are then validated by a symbolic verifier implementing the full set of 24 valid Aristotelian forms via a hash lookup.Our standout contribution is the dual-view consistency test: the system compares a "native" parse of the original text with a "masked" version where content terms are replaced by abstract symbols (X, Y, Z), only proceeding with high confidence if both views agree. By comparing how the model interprets the same logic in two different formats, the system can detect if the model’s reasoning changes when the context shifts from real-world objects to abstract symbols. The primary goal is to combat belief bias, the human-like tendency of LLMs to accept invalid arguments if the conclusion sounds true, or reject valid arguments if the conclusion sounds false. By enforcing this dual-view check, we found that symbol abstraction (View B) acts as a structural regularizer, forcing the model to ignore semantic interference and focus on the relationship between terms.
This paper introduces a simple approach for predicting how plausible a word sense is in short narratives where meaning is ambiguous. We use 13 hand-crafted features, including text statistics, word-level similarity computed using basic set-based comparisons, and measures of annotator disagreement. Five diverse and largely independent traditional machine learning models are combined using a weighted ensemble with minimal tuning. Despite theoretical grounding in classical disambiguation methods, our system achieves essentially random performance, with Spearman correlation (ρ) of −0.038 and accuracy within standard deviation of 0.542 on the official test set. This result demonstrates that surface-level lexical features, while interpretable, are insufficient for graded sense plausibility prediction without deep semantic representations. By selecting features inspired by classical word sense disambiguation techniques and incorporating signals derived from human disagreement, our model produces plausibility predictions that are largely interpretable. This negative result provides important baselines and insights for future work on graded word sense disambiguation.
We present Paradise, our system for SemEval-2026 Task 12: Abductive Event Reasoning, which identifies plausible direct causes of real-world English-language events using retrieved contextual documents. Our approach employs Qwen2.5-7B-Instruct, a 7-billion-parameter instruction-tuned language model combined with carefully engineered chain-of-thought prompting, requiring no task-specific fine-tuning or training-data supervision (prompt components were selected using the development set). The system achieves a score of 0.79 on the official 612-instance test set by integrating explicit causal-inference rules, 4,000-character document context windows, and greedy decoding. Analysis reveals that conservative prediction patterns, 87.1% single-label and 36.9% Option D, effectively exploit the asymmetric scoring metric. Ablation studies confirm that document context contributes +6.4 points, chain-of-thought reasoning +5.3 points, and explicit causal rules +3.1 points to development performance. Our code is publicly available at https://github.com/DhruvGoyal404/semeval2026-task12.