CCL23-Eval 任务1系统报告:基于信息论约束及篇章信息的古籍命名实体识别(System Report for CCL23-Eval Task 1: Information Theory Constraint and Paragraph based Paragraph Classical Named Entity Recognition)
Xinghua Zhang (张兴华), Tianjun Liu (刘天昀), Wenyuan Zhang (张文源), Tingwen Liu (柳厅文)
Abstract
“命名实体识别旨在自动识别出文本中具有特定意义的实体(例如,人名、地名),古籍文献中的命名实体识别通过识别人名、书籍、官职等实体,为深度挖掘、组织古汉语人文知识提供重要支撑。现有的中文命名实体识别方法主要聚焦在现代文,但古籍中的实体识别具有更大的挑战,表现在实体的歧义性和边界模糊性两方面。由于古籍行文简练,单字表达加剧了实体的歧义性问题,句读及分词断句难度的提升使实体边界的识别更具挑战性。为有效处理上述问题,本文提出一种基于信息论及篇章信息的古籍命名实体识别方法。通过检索古籍文本的来源信息融入篇章先验知识,并在同一篇章的古籍文本上采取滑动窗口采样增强,以引入篇章背景信息,有效缓解实体歧义性问题。此外,在信息论视角下,约束实体的上下文信息及实体本身特征的编码,最大程度保留泛化特征,去除冗余信息,缓解实体边界模糊的问题,在词义复杂多样、句读困难的古文典籍中提升命名实体识别性能。最终,在token-wise和span-level感知的命名实体识别基础框架下,本文的方法取得了最优的评测性能。”- Anthology ID:
- 2023.ccl-3.1
- Volume:
- Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
- Month:
- August
- Year:
- 2023
- Address:
- Harbin, China
- Editors:
- Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 1–13
- Language:
- Chinese
- URL:
- https://aclanthology.org/2023.ccl-3.1
- DOI:
- Cite (ACL):
- Xinghua Zhang, Tianjun Liu, Wenyuan Zhang, and Tingwen Liu. 2023. CCL23-Eval 任务1系统报告:基于信息论约束及篇章信息的古籍命名实体识别(System Report for CCL23-Eval Task 1: Information Theory Constraint and Paragraph based Paragraph Classical Named Entity Recognition). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations), pages 1–13, Harbin, China. Chinese Information Processing Society of China.
- Cite (Informal):
- CCL23-Eval 任务1系统报告:基于信息论约束及篇章信息的古籍命名实体识别(System Report for CCL23-Eval Task 1: Information Theory Constraint and Paragraph based Paragraph Classical Named Entity Recognition) (Zhang et al., CCL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.ccl-3.1.pdf