Jeng-Yue Liu
2026
Probing Functional Correctness in Diffusion Language Models
Guan-Ming Chiu | Jeng-Yue Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Guan-Ming Chiu | Jeng-Yue Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Diffusion language models generate text by iteratively denoising all tokens in parallel, but when and where their hidden states encode whether the output will be functionally correct remains unknown.We present the first probing study of DLM internals, training linear classifiers on hidden states to predict functional correctness.Across two models (LLaDA-8B, Dream-7B) and four tasks, we find that DLMs uniquely accumulate correctness signal across denoising steps (AUC gains of 0.08–0.11 on reasoning tasks), absent in single-pass AR decoding. However, step-0 signal reflects prompt difficulty rather than diffusion-specific computation. Signal emergence is task-dependent: structural tasks show flat profiles while reasoning tasks show gradual buildup. The two models exhibit distinct layer dynamics, with LLaDA concentrating signal in upper layers while Dream redistributes toward lower layers. We further show that probe confidence can identify likely failures, enabling selective generation that avoids 36–98% of wasted compute.