Pawan Kumar
2026
Speculative Refinement: A Hybrid Autoregressive Diffusion Decoding Strategy and Its Behavior Across Benchmarks
Aditi Gupta | Neel Mishra | Kushagra Trivedi | Pawan Kumar
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Aditi Gupta | Neel Mishra | Kushagra Trivedi | Pawan Kumar
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
How should we evaluate generation systems that combine autoregressive (AR) and diffusion decoding?We study this question through *Speculative Refinement* (SpecRef), a training-free hybrid method that warm-starts a masked diffusion language model from an AR draft using entropy-guided selective masking.Evaluating SpecRef across six benchmarks (HumanEval, MBPP, GSM8K, BBH, ARC-Challenge, HellaSwag) with three distinct evaluation protocols (execution-based pass@1, exact-match, log-likelihood scoring), we surface several findings relevant beyond our specific system:(1) code benchmarks conflate structural discovery with logical correctness: providing a syntactic scaffold lifts accuracy from near zero to over 20% without changing the model, indicating that much of the baseline failure is structural;(2) a *refinement tension* phenomenon where multi-stage correction degrades already-correct tokens, exposing benchmark saturation ceilings invisible to single-model evaluation;(3) log-likelihood and generative evaluation produce different model rankings for the same model pair, suggesting they measure different capabilities;(4) standard Python post-processing silently breaks code evaluation for non-AR generators.These observations apply to any multi-stage or non-autoregressive generation pipeline and point toward more diagnostic evaluation practices.
Theory-Explicit Prompting for MIND Self-States: Hierarchical LLMs and Dynamic Signature Extraction in Mental Health Timelines
Pawan Kumar | Ankit Meshram | Shubham Jha | Loitongbam Singh
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)
Pawan Kumar | Ankit Meshram | Shubham Jha | Loitongbam Singh
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)
This paper presents a system for the CLPsych 2026 Shared Task on longitudinal mental health modeling from social media timelines, grounded in the MIND framework. MIND conceptualizes mental health as evolving self-states defined by Affect, Behavior, Cognition, and Desire (ABCD), providing a structured lens on mental health trajectories. The system centers on a theory-explicit prompting framework for structured sequence summarization (Task 3.1) and recurrent dynamic signature extraction (Task 3.2), encoding the full ABCD taxonomy directly into the LLM prompt to ensure clinically grounded, interpretable outputs. A three-stage pipeline infers a direction-of-change label per sequence, produces structured ABCD summaries with few-shot exemplar augmentation, and aggregates these summaries to derive cross-individual recurrent patterns. The system ranks first on deterioration-related recurrent signatures and second overall, achieving the top Fit and Specificity scores in Task 3.2, demonstrating the benefits of explicit clinical grounding for conceptual accuracy.
Semantically Aware Optimal Transport for Dense Label Transfer
Preeti | Kiran Ravish | Ankita Kushwaha | Pawan Kumar
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Preeti | Kiran Ravish | Ankita Kushwaha | Pawan Kumar
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Vision foundation models produce features that generalize across visual domains without fine-tuning, yet naively transferring labels through these feature spaces fails under large distribution shifts.We propose SAOT (**S**emantically **A**ware **O**ptimal **T**ransport), which learns a transport cost within a fused unbalanced optimal transport formulation for dense label transfer from frozen vision transformer features to new domains.SAOT combines a learnable appearance metric with semantic class-prototype priors, unbalanced transport for partial matching under distribution shift, and a block-sparse solver for tractable inference.We pair this with a two-stage decoder: an MLP trained on SAOT pseudo-labels, then refined via EMA-teacher self-training with class-balanced sampling.On GTA5→Cityscapes with frozen DINOv2 ViT-L/14 features, SAOT+Decoder reaches 25.7% mIoU, a **3.8×** improvement over nearest-neighbor transfer (6.7%), without any backbone adaptation.Per-class results show large gains on spatially coherent classes (road 90.3%, car 76.2%, building 71.5%), demonstrating that learned semantic transport costs capture domain-invariant structure even under severe synthetic-to-real shifts. On VOC train→val with frozen ViT-B/16 features, the full pipeline reaches 47.5% mIoU, indicating that the approach extends beyond synthetic-to-real adaptation.