Peiyan Wang

Also published as: 裴岩


2025

pdf bib
Do LLMs Know and Understand Domain Conceptual Knowledge?
Sijia Shen | Feiyan Jiang | Peiyan Wang | Yubo Feng | Yuchen Jiang | Chang Liu
Findings of the Association for Computational Linguistics: EMNLP 2025

This paper focuses on the task of generating concept sememe trees to study whether Large Language Models (LLMs) can understand and generate domain conceptual knowledge. Concept sememe tree is a hierarchical structure that represents lexical meaning by combining sememes and their relationships.To this end, we introduce the Neighbor Semantic Structure (NSS) and Chain-of-Thought (CoT) prompting method to evaluate the effectiveness of various LLMs in generating accurate and comprehensive sememe trees across different domains. The NSS, guided by conceptual metaphors, identifies terms that exhibit significant external systematicity within a hierarchical relational network and incorporates them as examples in the learning process of LLMs. Meanwhile, the CoT prompting method guides LLMs through a systematic analysis of a term’s intrinsic core concepts, essential attributes, and semantic relationships, enabling the generation of concept sememe trees.We conduct experiments using datasets drawn from four authoritative terminology manuals and evaluate different LLMs. The experimental results indicate that LLMs possess the capability to capture and represent the conceptual knowledge aspects of domain-specific terms. Moreover, the integration of NSS examples with a structured CoT process allows LLMs to explore domain conceptual knowledge more profoundly, leading to the generation of highly accurate concept sememe trees.

2024

pdf bib
NNP-TDGM: 基于最近邻提示表征的术语DEF生成模型(NNP-TDGM: Nearest Neighbor Prompt Term DEF Generation Model)
Sijia Shen (沈思嘉) | Peiyan Wang (王裴岩) | Shengren Wang (王胜任) | Libang Wang (王立帮)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“该文研究基于HowNet的知识库描述语言语法体系的术语DEF自动生成问题,提出基于最近邻提示表征的术语DEF生成模型(NNP-TDGM),将训练集中的术语DEF构造为外显记忆集,在解码器生成(首)义原或关系时,检索与待预测术语概念结构相同或相近的术语所蕴含的核心概念,重要属性和关系类型,辅助模型完成DEF的生成,解决解码器在低频样本上训练不充分的问题。另外,通过提示预训练语言模型获得术语及术语定义内蕴涵概念信息的语义表征向量,改善编码器表征能力不足的问题。经实验验证NNP-TDGM模型生成术语DEF的义原-关系-义原三元组F1值达到31.84%、关系F1值达到53.12%、义原F1值达到51.55%、首义原F1值达到68.53%,相对于基线方法分别提升了3.38%,1.45%,1.08%,0.48%。”

pdf bib
面向工艺文本的实体与关系最近邻联合抽取模型(Nearest Neighbor Joint Extraction Model for Entity and Relationship in Process Text)
Danqingxin Yang (杨丹清忻) | Peiyan Wang (王裴岩) | Lijun Xu (徐立军)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“该 文 研 究 工 艺 文 本 中 实 体 关 系 联 合 抽 取 问 题 , 提 出 了 最 近 邻 联 合 抽 取 模 型(NNJE)。NNJE利用工艺文本中实体边界字间搭配规律建模外显记忆,通过最近邻方法在某种指定关系下为待预测组合检索出具有相似字间搭配的实例,为实体边界识别以及实体对组合提供更有力的限制条件,提升模型预测准确率,改善模型性能。实验设置了工艺文本关系数据集。实验结果表明,该文方法较基线模型准确率P值提高了3.53%,F1值提升了1.03%,优于PURE、CasRel、PRGC与TPlinker等方法,表明提出的方法能够有效地提升三元组抽取效果。”

pdf bib
面向中文实体识别的Transformers模型句子级非对抗鲁棒性研究(On Sentence-level Non-adversarial Robustness of Chinese Named Entity Recognition with Transformers Model)
Libang Wang (王立帮) | Peiyan Wang (王裴岩) | Sijia Shen (沈思嘉)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“基于Transformers的中文实体识别模型在标准实体识别基准测试中取得了卓越性能,其鲁棒性研究也受到了广泛关注。当前,中文实体识别模型在实际部署中所面临的句子级非对抗鲁棒性问题研究不足,该文针对该问题开展了研究。首先,该文从理论上分析并发现了Transformer中自注意力、相对位置嵌入及绝对位置嵌入对模型鲁棒性的负面影响。之后,提出了实体标签增强和滑动窗口约束的鲁棒性增强方法,并从理论上证明了提出方法能够提升Transformers模型的实体识别鲁棒性。最后,通过在3个中文数据集的实验,研究了4种基于Transformer的实体识别模型的脆弱性,所提出方法使模型的鲁棒性F1值提升最高可达4.95%。”

pdf bib
A Corpus and Method for Chinese Named Entity Recognition in Manufacturing
Ruiting Li | Peiyan Wang | Libang Wang | Danqingxin Yang | Dongfeng Cai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Manufacturing specifications are documents entailing different techniques, processes, and components involved in manufacturing. There is a growing demand for named entity recognition (NER) resources and techniques for manufacturing-specific named entities, with the development of smart manufacturing. In this paper, we introduce a corpus of Chinese manufacturing specifications, named MS-NERC, including 4,424 sentences and 16,383 entities. We also propose an entity recognizer named Trainable State Transducer (TST), which is initialized with a finite state transducer describing the morphological patterns of entities. It can directly recognize entities based on prior morphological knowledge without training. Experimental results show that TST achieves an overall 82.05% F1 score for morphological-specific entities in zero-shot. TST can be improved through training, the result of which outperforms neural methods in few-shot and rich-resource. We believe that our corpus and model will be valuable resources for NER research not only in manufacturing but also in other low-resource domains.