TESS 2: A Large-Scale Generalist Diffusion Language Model

Jaesung Tae, Hamish Ivison, Sachin Kumar, Arman Cohan


Abstract
We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with a diffusion loss and then performing further instruction tuning. We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models. We further propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model. Finally, we show that TESS 2 further improves with increased inference-time compute, highlighting the utility of diffusion LMs in having fine-grained controllability over the amount of compute used at inference time.
Anthology ID:
2025.acl-long.1029
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21171–21188
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1029/
DOI:
Bibkey:
Cite (ACL):
Jaesung Tae, Hamish Ivison, Sachin Kumar, and Arman Cohan. 2025. TESS 2: A Large-Scale Generalist Diffusion Language Model. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 21171–21188, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
TESS 2: A Large-Scale Generalist Diffusion Language Model (Tae et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1029.pdf