DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer

Haozhe Ji, Minlie Huang


Abstract
Despite the recent advances in applying pre-trained language models to generate high-quality texts, generating long passages that maintain long-range coherence is yet challenging for these models. In this paper, we propose DiscoDVT, a discourse-aware discrete variational Transformer to tackle the incoherence issue. DiscoDVT learns a discrete variable sequence that summarizes the global structure of the text and then applies it to guide the generation process at each decoding step. To further embed discourse-aware information into the discrete latent representations, we introduce an auxiliary objective to model the discourse relations within the text. We conduct extensive experiments on two open story generation datasets and demonstrate that the latent codes learn meaningful correspondence to the discourse structures that guide the model to generate long texts with better long-range coherence.
Anthology ID:
2021.emnlp-main.347
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4208–4224
Language:
URL:
https://aclanthology.org/2021.emnlp-main.347
DOI:
10.18653/v1/2021.emnlp-main.347
Bibkey:
Cite (ACL):
Haozhe Ji and Minlie Huang. 2021. DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4208–4224, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer (Ji & Huang, EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.347.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.347.mp4
Code
 cdjhz/discodvt
Data
BookCorpusWritingPrompts