Baoyou Chen

2026

T⋆: Progressive Block Scaling for Masked Diffusion Language Models Through Trajectory Aware Reinforcement Learning
Hanchen Xia | Baoyou Chen | Yutang Ge | Guojiang Zhao | Siyu Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present T⋆, a simple TraceRL-based curriculum for progressive block-size scaling in masked diffusion language models (MDMs).Starting from an AR-initialized small-block MDM, T⋆ gradually increases the block size while re-optimizing the denoising policy at each stage, enabling higher-parallelism decoding with limited degradation on math reasoning benchmarks. Across two SDAR scales and three benchmarks, T⋆ consistently outperforms direct large-block TraceRL and is substantially more stable during training. Our schedule analysis suggests that the learned policy does not simply revert to a strictly left-to-right order; instead, it retains block-size-specific non-monotone updates while improving accuracy.

Co-authors

Venues

ACL1

Fix author