基于BPE分词的中国古诗主题模型及主题可控的诗歌生成(Topic model and topic-controlled poetry generation of Chinese ancient poem based on BPE)

Jiarui Zhang (张家瑞), Wenhao Li (李文浩), Maosong Sun (孙茂松)


Abstract
中国古代诗歌是人类文化的瑰宝,其短小精悍的语言却能表达出极其丰富的含义和主题,从古至今吸引了无数的爱好者的欣赏。本文以超过锸锰万首古诗为研究对象,基于BPE算法,按照共现频率对古诗集进行分词,以便于下游任务对古诗的语义进行更准确的理解,我们还将分词后的古诗语料利用隐狄利克雷分配(LDA)模型进行了主题分析。通过比较、调整主题的数量得到了准确度较高的主题模型。更进一步,我们还对语料中的绝句和律诗逐句套用了主题模型,得到了一首诗内部的主题转移矩阵,并进行了一些相关的分析。最后,我们利用了简单的控制码方法将主题模型嵌入到诗歌生成模型中,实现了主题可控的诗歌生成,同时检验了我们训练的主题模型的有效性。
Anthology ID:
2021.ccl-1.77
Volume:
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:
August
Year:
2021
Address:
Huhhot, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
862–873
Language:
Chinese
URL:
https://aclanthology.org/2021.ccl-1.77
DOI:
Bibkey:
Cite (ACL):
Jiarui Zhang, Wenhao Li, and Maosong Sun. 2021. 基于BPE分词的中国古诗主题模型及主题可控的诗歌生成(Topic model and topic-controlled poetry generation of Chinese ancient poem based on BPE). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 862–873, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):
基于BPE分词的中国古诗主题模型及主题可控的诗歌生成(Topic model and topic-controlled poetry generation of Chinese ancient poem based on BPE) (Zhang et al., CCL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.ccl-1.77.pdf