Jian Wang
Other people with similar names: Jian Wang, Jian Wang (Hong Kong Polytechnic)
Unverified author pages with similar names: Jian Wang
2026
Reinforcement Learning for Diffusion LLMs via Energy-Based Gibbs Alignment
Yijia Fan | Jing Yang | Mingyu Liu | Kaitong Cai | Jian Wang | Keze Wang | Jusheng Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yijia Fan | Jing Yang | Mingyu Liu | Kaitong Cai | Jian Wang | Keze Wang | Jusheng Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive paradigm for text generation, offering parallel decoding and bidirectional context modeling. However, aligning dLLMs with reinforcement learning (RL) remains a significant challenge, as the marginal likelihood of sequences in masked diffusion is typically intractable, rendering standard policy gradient methods unstable or computationally prohibitive. In this work, we propose **Diffusion-Gibbs Alignment (DGA)**, a novel variational framework that reformulates RL for dLLMs as a distribution matching problem. DGA bypasses the explicit computation of log-probabilities by leveraging a learned energy function to model the relative quality of samples. The optimization is decoupled into two stable steps: (1) contrastive energy ranking to capture global reward structures, and (2) weighted diffusion alignment to update the policy via importance sampling. Empirically, DGA establishes a new state-of-the-art across logical reasoning (Sudoku, Countdown), mathematical reasoning (GSM8K, Math500), and code generation (HumanEval, MBPP) benchmarks. DGA offers a novel variational perspective for dLLM alignment, achieving better performance while simultaneously enhancing training speed and memory efficiency.
2025
DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off
Jusheng Zhang | Yijia Fan | Kaitong Cai | Zimeng Huang | Xiaofei Sun | Jian Wang | Chengpei Tang | Keze Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Jusheng Zhang | Yijia Fan | Kaitong Cai | Zimeng Huang | Xiaofei Sun | Jian Wang | Chengpei Tang | Keze Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
This paper introduces DrDiff, a novel framework for long-text generation that overcomes the efficiency-quality trade-off through three core technologies. First, we design a dynamic expert scheduling mechanism that intelligently allocates computational resources during the diffusion process based on text complexity, enabling more efficient handling of text generation tasks of varying difficulty. Second, we introduce a Hierarchical Sparse Attention (HSA) mechanism that adaptively adjusts attention patterns according to a variety of input lengths, reducing computational complexity from O(n2) to O(n) while maintaining model performance. Finally, we propose a Semantic Anchor States (SAS) module that combines with DPM-solver++ to reduce diffusion steps, significantly improving generation speed. Comprehensive experiments on various long-text generation benchmarks demonstrate the superiority of our DrDiff over the existing SOTA methods.