Pseudo-Likelihood Training for Reasoning Diffusion Language Models

Shiv Shankar


Abstract
Policy-gradient reinforcement learning (PGRL) forms the backbone of current methods used to enhance alignment and reasoning in Large Language Models (LLMs). However, these methods are incompatible with diffusion based language models (dLLMs). Most attempts to apply PGRL to dLLMs, are either not scalable or use unprincipled approximations. This work, introduces PADRE a framework that uses a novel pseudo-likelihood based objective for alignment of dLLMs. Our objective has the same optima as PGRL based optimization, but does not need to evaluate exact likelihood from dLLMs. Experiments on various coding and mathematical reasoning benchmarks show that our method matches or surpasses the performance of recent dLLM training baselines such as diffu-GRPO/d1. Our approach provides a stable and practical alternative for RL-based fine-tuning of reasoning-focused dLLMs.
Anthology ID:
2026.eacl-long.257
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5514–5529
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.257/
DOI:
Bibkey:
Cite (ACL):
Shiv Shankar. 2026. Pseudo-Likelihood Training for Reasoning Diffusion Language Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5514–5529, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Pseudo-Likelihood Training for Reasoning Diffusion Language Models (Shankar, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.257.pdf