Danqingxin Yang
Also published as: 丹清忻 杨
2024
面向工艺文本的实体与关系最近邻联合抽取模型(Nearest Neighbor Joint Extraction Model for Entity and Relationship in Process Text)
Danqingxin Yang (杨丹清忻)
|
Peiyan Wang (王裴岩)
|
Lijun Xu (徐立军)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“该 文 研 究 工 艺 文 本 中 实 体 关 系 联 合 抽 取 问 题 , 提 出 了 最 近 邻 联 合 抽 取 模 型(NNJE)。NNJE利用工艺文本中实体边界字间搭配规律建模外显记忆,通过最近邻方法在某种指定关系下为待预测组合检索出具有相似字间搭配的实例,为实体边界识别以及实体对组合提供更有力的限制条件,提升模型预测准确率,改善模型性能。实验设置了工艺文本关系数据集。实验结果表明,该文方法较基线模型准确率P值提高了3.53%,F1值提升了1.03%,优于PURE、CasRel、PRGC与TPlinker等方法,表明提出的方法能够有效地提升三元组抽取效果。”
A Corpus and Method for Chinese Named Entity Recognition in Manufacturing
Ruiting Li
|
Peiyan Wang
|
Libang Wang
|
Danqingxin Yang
|
Dongfeng Cai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Manufacturing specifications are documents entailing different techniques, processes, and components involved in manufacturing. There is a growing demand for named entity recognition (NER) resources and techniques for manufacturing-specific named entities, with the development of smart manufacturing. In this paper, we introduce a corpus of Chinese manufacturing specifications, named MS-NERC, including 4,424 sentences and 16,383 entities. We also propose an entity recognizer named Trainable State Transducer (TST), which is initialized with a finite state transducer describing the morphological patterns of entities. It can directly recognize entities based on prior morphological knowledge without training. Experimental results show that TST achieves an overall 82.05% F1 score for morphological-specific entities in zero-shot. TST can be improved through training, the result of which outperforms neural methods in few-shot and rich-resource. We believe that our corpus and model will be valuable resources for NER research not only in manufacturing but also in other low-resource domains.