Dhruv Goyal

2026

Team Paradise at #SMM4H-HeaRD 2026: Multi-Task Approaches for Social Media Health Mining
Dhruv Goyal | Ishita Gupta | Jatin Bedi
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks

We present Team Paradise’s systems for three tasks in the SMM4H-HeaRD 2026 shared task: multilingual adverse drug event detection (Task 1), influenza vaccine effectiveness estimation via two-subtask classification (Task 3), and opioid impact span extraction (Task 7). For Task 1, threshold-only ablation on XLMRoBERTa-large achieves a macro-F1 of 0.597, exceeding the field mean (0.547) by +0.050. For Task 3, a three-stage hybrid pipeline combining twitter-RoBERTa-base-2022 with rule-based post-processing achieves Micro-F1 0.8434 (Subtask 1: vaccination status) and 0.8936 (Subtask 2: test results). For Task 7, RoBERTa-large with CRF decoding and sliding-window inference obtains relaxed F1 0.60 despite severe train-test distributional shift Across tasks, we identify class imbalance, temporal ambiguity, and platform heterogeneity as central challenges.

pdf bib abs

0704mis at SemEval-2026 Task 11: Single-Call Joint Abstraction for Robust Neuro-Symbolic Retrieval
Ishita Gupta | Dhruv Goyal | Jatin Bedi
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Neuro-symbolic Basis for Robust Syllogistic Reasoning Under Distractors.We present our submission to SemEval-2026 Task 11 Subtasks 2 and 4, on syllogistic premise retrieval with distractors. Our system is based on a robustness-first neuro-symbolic pipeline. The key innovation is single-call joint abstraction: rather than parsing all statements independently, one LLM call jointly abstracts all premises and the conclusion into categorical logical forms (A/E/I/O) where symbolic (X/Y/Z) mappings are globally consistent. This allows reliable detection of the shared middle term needed for syllogistic validation. Parsed forms are passed through an exhaustive O(n²) premise-pair search with deterministic validation against the 24 valid Aristotelian syllogistic forms via constant time lookup. Ablation studies show that more theoretically sophisticated variants degrade performance when logical-form extraction is the primary bottleneck. Our approach achieves competitive rankings in both English and multilingual settings while remaining simple, deterministic, and content-invariant.

pdf bib abs

Dual-View Consistency Testing for Content-Invariant Multilingual Syllogistic Reasoning
Ishita Gupta | Dhruv Goyal | Jatin Bedi
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Team 0704mis addressed the SemEval-2026 Task 11 Subtask 3 by building a neuro-symbolic system designed for multilingual syllogistic validity classification across 12 typologically diverse languages. The process involves a neural parser that extracts logical forms from text, which are then validated by a symbolic verifier implementing the full set of 24 valid Aristotelian forms via a hash lookup.Our standout contribution is the dual-view consistency test: the system compares a "native" parse of the original text with a "masked" version where content terms are replaced by abstract symbols (X, Y, Z), only proceeding with high confidence if both views agree. By comparing how the model interprets the same logic in two different formats, the system can detect if the model’s reasoning changes when the context shifts from real-world objects to abstract symbols. The primary goal is to combat belief bias, the human-like tendency of LLMs to accept invalid arguments if the conclusion sounds true, or reject valid arguments if the conclusion sounds false. By enforcing this dual-view check, we found that symbol abstraction (View B) acts as a structural regularizer, forcing the model to ignore semantic interference and focus on the relationship between terms.

pdf bib abs

Paradise at SemEval-2026 Task 5: On the Limitations of Surface-Level Features for Graded Word Sense Plausibility Prediction
Dhruv Goyal | Ishita Gupta | Jatin Bedi
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper introduces a simple approach for predicting how plausible a word sense is in short narratives where meaning is ambiguous. We use 13 hand-crafted features, including text statistics, word-level similarity computed using basic set-based comparisons, and measures of annotator disagreement. Five diverse and largely independent traditional machine learning models are combined using a weighted ensemble with minimal tuning. Despite theoretical grounding in classical disambiguation methods, our system achieves essentially random performance, with Spearman correlation (ρ) of −0.038 and accuracy within standard deviation of 0.542 on the official test set. This result demonstrates that surface-level lexical features, while interpretable, are insufficient for graded sense plausibility prediction without deep semantic representations. By selecting features inspired by classical word sense disambiguation techniques and incorporating signals derived from human disagreement, our model produces plausibility predictions that are largely interpretable. This negative result provides important baselines and insights for future work on graded word sense disambiguation.

pdf bib abs

Paradise at SemEval-2026 Task 12: Leveraging Instruction-Tuned Large Language Models with Chain-of-Thought Prompting for Abductive Event Reasoning
Dhruv Goyal | Ishita Gupta | Jatin Bedi
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We present Paradise, our system for SemEval-2026 Task 12: Abductive Event Reasoning, which identifies plausible direct causes of real-world English-language events using retrieved contextual documents. Our approach employs Qwen2.5-7B-Instruct, a 7-billion-parameter instruction-tuned language model combined with carefully engineered chain-of-thought prompting, requiring no task-specific fine-tuning or training-data supervision (prompt components were selected using the development set). The system achieves a score of 0.79 on the official 612-instance test set by integrating explicit causal-inference rules, 4,000-character document context windows, and greedy decoding. Analysis reveals that conservative prediction patterns, 87.1% single-label and 36.9% Option D, effectively exploit the asymmetric scoring metric. Ablation studies confirm that document context contributes +6.4 points, chain-of-thought reasoning +5.3 points, and explicit causal rules +3.1 points to development performance. Our code is publicly available at https://github.com/DhruvGoyal404/semeval2026-task12.

Co-authors

Jatin Bedi 5
Ishita Gupta 5

Venues

Fix author