DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

Zhengfu He; Tianxiang Sun; Qiong Tang; Kuanning Wang; Xuan-Jing Huang; Xipeng Qiu

doi:10.18653/v1/2023.acl-long.248

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

Zhengfu He, Tianxiang Sun, Qiong Tang, Kuanning Wang, Xuanjing Huang, Xipeng Qiu

Abstract

We present DiffusionBERT, a new generative masked language model based on discrete dif- fusion models. Diffusion models and many pre- trained language models have a shared training objective, i.e., denoising, making it possible to combine the two powerful models and enjoy the best of both worlds. On the one hand, dif- fusion models offer a promising training strat- egy that helps improve the generation quality. On the other hand, pre-trained denoising lan- guage models (e.g., BERT) can be used as a good initialization that accelerates convergence. We explore training BERT to learn the reverse process of a discrete diffusion process with an absorbing state and elucidate several designs to improve it. First, we propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step based on the information of each token. Sec- ond, we investigate several designs of incorpo- rating the time step into BERT. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improve- ment over existing diffusion models for text (e.g., D3PM and Diffusion-LM) and previous generative masked language models in terms of perplexity and BLEU score. Promising re- sults in conditional generation tasks show that DiffusionBERT can generate texts of compa- rable quality and more diverse than a series of established baselines.

Anthology ID:: 2023.acl-long.248
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4521–4534
Language:
URL:: https://aclanthology.org/2023.acl-long.248
DOI:: 10.18653/v1/2023.acl-long.248
Bibkey:
Cite (ACL):: Zhengfu He, Tianxiang Sun, Qiong Tang, Kuanning Wang, Xuanjing Huang, and Xipeng Qiu. 2023. DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4521–4534, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models (He et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-2023-videos/2023.acl-long.248.pdf

PDF Search