Shiv Shankar

2026

Energy Matching based Preference Learning for Diffusion Language Models
Shiv Shankar
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Policy-gradient reinforcement learning (RL) is widely used to improve language model reasoning, but existing methods are not compatible with diffusion language models. The primary reason for this is the difficulty of likelihood estimation with such models. We propose EMBR, a scalable off-policy framework that reformulates KL-regularized RL as an energy-based distribution matching problem. By aligning policy updates with reward signals through energy matching,EMBR avoids the overhead of on-policy learning and the variance of importance weighting. We further derive a principled upper bound for the energy matching objective which can be used to fine-tune dLLMs. Experiments on multiple benchmarks in both online and offline setting show that EMBR matches or surpasses the performance of diffu-GRPO and related baselines in the online case, and of DPO in the offline case. Our approach provides a practical alternative for post-training of diffusion LMs.

pdf bib abs

Pseudo-Likelihood Training for Reasoning Diffusion Language Models
Shiv Shankar
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Policy-gradient reinforcement learning (PGRL) forms the backbone of current methods used to enhance alignment and reasoning in Large Language Models (LLMs). However, these methods are incompatible with diffusion based language models (dLLMs). Most attempts to apply PGRL to dLLMs, are either not scalable or use unprincipled approximations. This work, introduces PADRE a framework that uses a novel pseudo-likelihood based objective for alignment of dLLMs. Our objective has the same optima as PGRL based optimization, but does not need to evaluate exact likelihood from dLLMs. Experiments on various coding and mathematical reasoning benchmarks show that our method matches or surpasses the performance of recent dLLM training baselines such as diffu-GRPO/d1. Our approach provides a stable and practical alternative for RL-based fine-tuning of reasoning-focused dLLMs.

pdf bib abs

Learning Shortcut Models for Efficient Recursive Reasoning
Shiv Shankar
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

Recursive models that progressively refine latent representations have demonstrated strong performance on a variety of reasoning tasks. However, these models only control whether and when to stop early, not how computation is distributed. In this work, we introduce shortcut reasoning, a framework for distilling recursive latent reasoning into a multiscale jump model that enables flexible test-time compute. We reinterpret recursive reasoning as a latent-time dynamical process and train a student model to predict the effect of multiple reasoning steps at once. To ensure robustness, we augment shortcut transitions with a repair mechanism, where a denoising variant of the base model projects latent states back onto a valid reasoning manifold. We further introduce stepwise improvement supervision, encouraging each shortcut step to increase the likelihood of the correct answer. Experiments on ARC-AGI show that our approach achieves competitive accuracy compared to recursive baselines while requiring fewer sequential updates.

2022

pdf bib abs

Multimodal fusion via cortical network inspired losses
Shiv Shankar
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Information integration from different modalities is an active area of research. Human beings and, in general, biological neural systems are quite adept at using a multitude of signals from different sensory perceptive fields to interact with the environment and each other. Recent work in deep fusion models via neural networks has led to substantial improvements over unimodal approaches in areas like speech recognition, emotion recognition and analysis, captioning and image description. However, such research has mostly focused on architectural changes allowing for fusion of different modalities while keeping the model complexity manageable. Inspired by neuroscientific ideas about multisensory integration and processing, we investigate the effect of introducing neural dependencies in the loss functions. Experiments on multimodal sentiment analysis tasks with different models show that our approach provides a consistent performance boost.

2018

pdf bib abs

Surprisingly Easy Hard-Attention for Sequence to Sequence Learning
Shiv Shankar | Siddhant Garg | Sunita Sarawagi
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning. The method combines the advantage of sharp focus in hard attention and the implementation ease of soft attention. On five translation tasks we show effortless and consistent gains in BLEU compared to existing attention mechanisms.

Co-authors

Siddhant Garg 1
Sunita Sarawagi 1

Venues

Fix author