CCL23-Eval 任务1系统报告:基于信息论约束及篇章信息的古籍命名实体识别(System Report for CCL23-Eval Task 1: Information Theory Constraint and Paragraph based Paragraph Classical Named Entity Recognition)

Xinghua Zhang (张兴华), Tianjun Liu (刘天昀), Wenyuan Zhang (张文源), Tingwen Liu (柳厅文)


Abstract
“命名实体识别旨在自动识别出文本中具有特定意义的实体(例如,人名、地名),古籍文献中的命名实体识别通过识别人名、书籍、官职等实体,为深度挖掘、组织古汉语人文知识提供重要支撑。现有的中文命名实体识别方法主要聚焦在现代文,但古籍中的实体识别具有更大的挑战,表现在实体的歧义性和边界模糊性两方面。由于古籍行文简练,单字表达加剧了实体的歧义性问题,句读及分词断句难度的提升使实体边界的识别更具挑战性。为有效处理上述问题,本文提出一种基于信息论及篇章信息的古籍命名实体识别方法。通过检索古籍文本的来源信息融入篇章先验知识,并在同一篇章的古籍文本上采取滑动窗口采样增强,以引入篇章背景信息,有效缓解实体歧义性问题。此外,在信息论视角下,约束实体的上下文信息及实体本身特征的编码,最大程度保留泛化特征,去除冗余信息,缓解实体边界模糊的问题,在词义复杂多样、句读困难的古文典籍中提升命名实体识别性能。最终,在token-wise和span-level感知的命名实体识别基础框架下,本文的方法取得了最优的评测性能。”
Anthology ID:
2023.ccl-3.1
Volume:
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Month:
August
Year:
2023
Address:
Harbin, China
Editors:
Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1–13
Language:
Chinese
URL:
https://aclanthology.org/2023.ccl-3.1
DOI:
Bibkey:
Cite (ACL):
Xinghua Zhang, Tianjun Liu, Wenyuan Zhang, and Tingwen Liu. 2023. CCL23-Eval 任务1系统报告:基于信息论约束及篇章信息的古籍命名实体识别(System Report for CCL23-Eval Task 1: Information Theory Constraint and Paragraph based Paragraph Classical Named Entity Recognition). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations), pages 1–13, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):
CCL23-Eval 任务1系统报告:基于信息论约束及篇章信息的古籍命名实体识别(System Report for CCL23-Eval Task 1: Information Theory Constraint and Paragraph based Paragraph Classical Named Entity Recognition) (Zhang et al., CCL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.ccl-3.1.pdf