CCL23-Eval 任务1系统报告:基于增量预训练与对抗学习的古籍命名实体识别(System Report for CCL23-Eval Task 1:::GuNER Based on Incremental Pretraining and Adversarial Learning)

Jianlong Li (剑龙李,), Youren Yu (于右任), Xueyang Liu (刘雪阳), Siwen Zhu (朱思文)


Abstract
“古籍命名实体识别是正确分析处理古汉语文本的基础步骤,也是深度挖掘、组织人文知识的重要前提。古汉语信息熵高、艰涩难懂,因此该领域技术研究进展缓慢。针对现有实体识别模型抗干扰能力差、实体边界识别不准确的问题,本文提出使用NEZHA-TCN与全局指针相结合的方式进行古籍命名实体识别。同时构建了一套古文数据集,该数据集包含正史中各种古籍文本,共87M,397,995条文本,用于NEZHA-TCN模型的增量预训练。在模型训练过程中,为了增强模型的抗干扰能力,引入快速梯度法对词嵌入层添加干扰。实验结果表明,本文提出的方法能够有效挖掘潜藏在古籍文本中的实体信息,F1值为95.34%。”
Anthology ID:
2023.ccl-3.3
Volume:
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Month:
August
Year:
2023
Address:
Harbin, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
23–33
Language:
Chinese
URL:
https://aclanthology.org/2023.ccl-3.3
DOI:
Bibkey:
Cite (ACL):
Jianlong Li, Youren Yu, Xueyang Liu, and Siwen Zhu. 2023. CCL23-Eval 任务1系统报告:基于增量预训练与对抗学习的古籍命名实体识别(System Report for CCL23-Eval Task 1:::GuNER Based on Incremental Pretraining and Adversarial Learning). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations), pages 23–33, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):
CCL23-Eval 任务1系统报告:基于增量预训练与对抗学习的古籍命名实体识别(System Report for CCL23-Eval Task 1:::GuNER Based on Incremental Pretraining and Adversarial Learning) (Li et al., CCL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2023.ccl-3.3.pdf