Long Text Generation with Topic-aware Discrete Latent Variable Model
Erguang Yang, Mingtong Liu, Deyi Xiong, Yujie Zhang, Yufeng Chen, Jinan Xu
Abstract
Generating coherent long texts is an important yet challenging task, particularly forthe open-ended generation. Prior work based on discrete latent codes focuses on the modeling of discourse relation, resulting in discrete codes only learning shallow semantics (Ji and Huang, 2021). A natural text always revolves around several related topics and the transition across them is natural and smooth.In this work, we investigate whether discrete latent codes can learn information of topics. To this end, we build a topic-aware latent code-guided text generation model. To encourage discrete codes to model information about topics, we propose a span-level bag-of-words training objective for the model. Automatic and manual evaluation experiments show that our method can generate more topic-relevant and coherent texts.- Anthology ID:
- 2022.emnlp-main.554
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8100–8107
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.554
- DOI:
- 10.18653/v1/2022.emnlp-main.554
- Cite (ACL):
- Erguang Yang, Mingtong Liu, Deyi Xiong, Yujie Zhang, Yufeng Chen, and Jinan Xu. 2022. Long Text Generation with Topic-aware Discrete Latent Variable Model. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8100–8107, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Long Text Generation with Topic-aware Discrete Latent Variable Model (Yang et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-main.554.pdf