@inproceedings{zhang-etal-2021-ji-yu-bpefen,
title = "基于{BPE}分词的中国古诗主题模型及主题可控的诗歌生成(Topic model and topic-controlled poetry generation of {C}hinese ancient poem based on {BPE})",
author = "Zhang, Jiarui and
Li, Wenhao and
Sun, Maosong",
editor = "Li, Sheng and
Sun, Maosong and
Liu, Yang and
Wu, Hua and
Liu, Kang and
Che, Wanxiang and
He, Shizhu and
Rao, Gaoqi",
booktitle = "Proceedings of the 20th Chinese National Conference on Computational Linguistics",
month = aug,
year = "2021",
address = "Huhhot, China",
publisher = "Chinese Information Processing Society of China",
url = "https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.ccl-1.77/",
pages = "862--873",
language = "zho",
abstract = "中国古代诗歌是人类文化的瑰宝,其短小精悍的语言却能表达出极其丰富的含义和主题,从古至今吸引了无数的爱好者的欣赏。本文以超过锸锰万首古诗为研究对象,基于BPE算法,按照共现频率对古诗集进行分词,以便于下游任务对古诗的语义进行更准确的理解,我们还将分词后的古诗语料利用隐狄利克雷分配(LDA)模型进行了主题分析。通过比较、调整主题的数量得到了准确度较高的主题模型。更进一步,我们还对语料中的绝句和律诗逐句套用了主题模型,得到了一首诗内部的主题转移矩阵,并进行了一些相关的分析。最后,我们利用了简单的控制码方法将主题模型嵌入到诗歌生成模型中,实现了主题可控的诗歌生成,同时检验了我们训练的主题模型的有效性。"
}