Arshad Khatib

2026

Clutch or Cry at SemEval-2026 Task 12: Offline Retrieval-Augmented Generation with Frozen DeBERTa for Abductive Event Reasoning
Aayush Prasad | Rudra Trivedi | Arshad Khatib | Shrikant Malviya | Naveen Kumar
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We present our system for SemEval-2026 Task 12 on abductive event reasoning. Initial experiments with direct fine-tuning of large language models suffered from severe overfitting due to limited training data, while smaller models failed under context-length constraints, leading to random guessing under the strict Exact Match evaluation metric. To address these challenges, we propose a two-stage offline Retrieval-Augmented Generation (RAG) pipeline that separates semantic evidence retrieval from multi-label classification. We employ a dense retriever (all-MiniLM-L6-v2) to extract the single most relevant sentence (top-k=1) and feed it into a partially frozen DeBERTa-v3-Large classifier trained with BCEWithLogitsLoss. Freezing the lower 12 layers effectively mitigates overfitting while preserving pre-trained semantic knowledge. Our approach eliminates long-context truncation issues, reduces hallucination, and achieves a final Exact Match accuracy of 0.72 on the official test set.

2025

pdf bib abs

“Clutch or Cry” Team at TRACS @ WASP2025: A Hybrid Stacking Ensemble for Astrophysical Document Classification
Arshad Khatib | Aayush Prasad | Rudra Trivedi | Shrikant Malviya
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications

Automatically identifying telescopes and their roles within astrophysical literature is crucial for large-scale scientific analysis and tracking instrument usage patterns. This paper describes the system developed by the “Clutch or Cry” team for the Telescope Reference and Astronomy Categorization Shared task (TRACS) at WASP 2025. The task involved two distinct challenges: multi-class telescope identification (Task 1) and multi-label role classification (Task 2). For Task 1, we employed a feature-centric approach combining document identifiers, metadata, and textual features to achieve high accuracy. For the more complex Task 2, we utilized a carefully designed two-level stacking ensemble. This hybrid model effectively fused symbolic information from a rule-based classifier with deep semantic understanding from a domain-adapted transformer. A subsequent meta-learning stage then performed targeted optimization for each role. These architectures were designed to address the primary challenges of handling long documents and managing severe class imbalance. A systematic optimization strategy focused on mitigating this imbalance significantly improved performance for minority classes. This work validates the effectiveness of using tailored, hybrid approaches and targeted optimization for complex classification tasks in specialized scientific domains.

Co-authors

Venues

Fix author