Yanhua Wang

Also published as: 艳华


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
融合领域词汇扩充的低资源法律文书命名实体识别(Named Entity Recognition for Low-Resource Legal Documents Using Integrated Domain Vocabulary Expansion)
Tulajiang Paerhati (帕尔哈提吐拉江) | Yuanyuan Sun (孙嫒媛) | Aichen Cai (蔡艾辰) | Yanhua Wang (王艳华) | Hongfei Lin (林鸿飞)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“目前基于预训练语言模型的司法领域低资源法律文书命名实体识别研究主要面临两个问题:(1)在低资源语言中,如维吾尔语,法律文书相关的语料极其有限,这种语料资源稀缺限制了基于预训练语言模型的训练和性能。(2)法律文书中使用的专业术语不仅复杂且特定,新的法律术语和概念的出现使得现有的模型难以适应。针对上述问题,本文基于多语言预训练模型mBERT,通过领域词汇扩充及模型微调的方法,提升了模型在维吾尔语法律文书命名实体识别任务的性能。本文首先整理并构建维吾尔语司法领域专业词汇列表,并将其添加到mBERT模型的词汇表中。随后,在人工标注的维吾尔语法律文书命名实体数据集UgLaw-NERD上进行模型微调,验证了该方法的有效性。实验结果表明,相比于仅使用mBERT进行微调的基线模型,融合领域词汇扩充的模型在命名实体识别任务上F1得分提升至89.72%,较基线提高了7.39%。此外,本文还探讨了不同领域词汇扩充量对模型命名实体识别性能的影响,结果显示,领域词汇扩充增强了预训练模型在处理维吾尔语任务中的表现。这些结论为其他低资源语言在司法领域开展基于预训练模型的自然语言处理研究提供了有益的参考。”