融入篇章信息的文学作品命名实体识别(Document-level Literary Named Entity Recognition)

Yuxiang Jia (贾玉祥), Rui Chao (晁睿), Hongying Zan (昝红英), Huayi Dou (窦华溢), Shuai Cao (曹帅), Shuo Xu (徐硕)


Abstract
命名实体识别是文学作品智能分析的基础性工作,当前文学领域命名实体识别的研究还较薄弱,一个主要的原因是缺乏标注语料。本文从金庸小说入手,对两部小说180余万字进行了命名实体的标注,共标注4类实体5万多个。针对小说文本的特点,本文提出融入篇章信息的命名实体识别模型,引入篇章字典保存汉字的历史状态,利用可信度计算融合BiGRU-CRF与Transformer模型。实验结果表明,利用篇章信息有效地提升了命名实体识别的效果。最后,我们还探讨了命名实体识别在小说社会网络构建中的应用。
Anthology ID:
2021.ccl-1.54
Volume:
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:
August
Year:
2021
Address:
Huhhot, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
600–611
Language:
Chinese
URL:
https://aclanthology.org/2021.ccl-1.54
DOI:
Bibkey:
Cite (ACL):
Yuxiang Jia, Rui Chao, Hongying Zan, Huayi Dou, Shuai Cao, and Shuo Xu. 2021. 融入篇章信息的文学作品命名实体识别(Document-level Literary Named Entity Recognition). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 600–611, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):
融入篇章信息的文学作品命名实体识别(Document-level Literary Named Entity Recognition) (Jia et al., CCL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.ccl-1.54.pdf