融入篇章信息的文学作品命名实体识别(Document-level Literary Named Entity Recognition)
Yuxiang Jia (贾玉祥), Rui Chao (晁睿), Hongying Zan (昝红英), Huayi Dou (窦华溢), Shuai Cao (曹帅), Shuo Xu (徐硕)
Abstract
命名实体识别是文学作品智能分析的基础性工作,当前文学领域命名实体识别的研究还较薄弱,一个主要的原因是缺乏标注语料。本文从金庸小说入手,对两部小说180余万字进行了命名实体的标注,共标注4类实体5万多个。针对小说文本的特点,本文提出融入篇章信息的命名实体识别模型,引入篇章字典保存汉字的历史状态,利用可信度计算融合BiGRU-CRF与Transformer模型。实验结果表明,利用篇章信息有效地提升了命名实体识别的效果。最后,我们还探讨了命名实体识别在小说社会网络构建中的应用。- Anthology ID:
- 2021.ccl-1.54
- Volume:
- Proceedings of the 20th Chinese National Conference on Computational Linguistics
- Month:
- August
- Year:
- 2021
- Address:
- Huhhot, China
- Editors:
- Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 600–611
- Language:
- Chinese
- URL:
- https://preview.aclanthology.org/remove-affiliations/2021.ccl-1.54/
- DOI:
- Cite (ACL):
- Yuxiang Jia, Rui Chao, Hongying Zan, Huayi Dou, Shuai Cao, and Shuo Xu. 2021. 融入篇章信息的文学作品命名实体识别(Document-level Literary Named Entity Recognition). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 600–611, Huhhot, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 融入篇章信息的文学作品命名实体识别(Document-level Literary Named Entity Recognition) (Jia et al., CCL 2021)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/2021.ccl-1.54.pdf