Kunli Zhang


2022

pdf
CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Ningyu Zhang | Mosha Chen | Zhen Bi | Xiaozhuan Liang | Lei Li | Xin Shang | Kangping Yin | Chuanqi Tan | Jian Xu | Fei Huang | Luo Si | Yuan Ni | Guotong Xie | Zhifang Sui | Baobao Chang | Hui Zong | Zheng Yuan | Linfeng Li | Jun Yan | Hongying Zan | Kunli Zhang | Buzhou Tang | Qingcai Chen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually offering great promise for medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.

2021

pdf
糖尿病电子病历实体及关系标注语料库构建(Construction of Corpus for Entity and Relation Annotation of Diabetes Electronic Medical Records)
Yajuan Ye (叶娅娟) | Bin Hu (胡斌) | Kunli Zhang (张坤丽) | Hongying Zan (昝红英)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

电子病历是医疗信息的重要来源,包含大量与医疗相关的领域知识。本文从糖尿病电子病历文本入手,在调研了国内外已有的电子病历语料库的基础上,参考坉圲坂圲实体及关系分类,建立了糖尿病电子病历实体及实体关系分类体系,并制定了标注规范。利用实体及关系标注平台,进行了实体及关系预标注及多轮人工校对工作,形成了糖尿病电子病历实体及关系标注语料库(Diabetes Electronic Medical Record entity and Related Corpus DEMRC)。所构建的DEMRC包含8899个实体、456个实体修饰及16564个关系。对DEMRC进行一致性评价和分析,标注结果达到了较高的一致性。针对实体识别和实体关系抽取任务,分别采用基于迁移学习的Bi-LSTM-CRF模型和RoBERTa模型进行初步实验,并对语料库中的各类实体及关系进行评估,为后续糖尿病电子病历实体识别及关系抽取研究以及糖尿病知识图谱构建打下基础。

pdf
脑卒中疾病电子病历实体及实体关系标注语料库构建(Corpus Construction for Named-Entity and Entity Relations for Electronic Medical Records of Stroke Disease)
Hongyang Chang (常洪阳) | Hongying Zan (昝红英) | Yutuan Ma (马玉团) | Kunli Zhang (张坤丽)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

本文探讨了在脑卒中疾病中文电子病历文本中实体及实体间关系的标注问题,提出了适用于脑卒中疾病电子病历文本的实体及实体关系标注体系和规范。在标注体系和规范的指导下,进行了多轮的人工标注及校正工作,完成了158万余字的脑卒中电子病历文本实体及实体关系的标注工作。构建了脑卒中电子病历实体及实体关系标注语料库(Stroke Electronic Medical Record entity and entity related Corpus SEMRC)。所构建的语料库共包含命名实体10594个,实体关系14457个。实体名标注一致率达到85.16%,实体关系标注一致率达到94.16%。

2020

pdf
面向医学文本处理的医学实体标注规范(Medical Entity Annotation Standard for Medical Text Processing)
Huan Zhang (张欢) | Yuan Zong (宗源) | Baobao Chang (常宝宝) | Zhifang Sui (穗志方) | Hongying Zan (昝红英) | Kunli Zhang (张坤丽)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

随着智慧医疗的普及,利用自然语言处理技术识别医学信息的需求日益增长。目前,针对医学实体而言,医学共享语料库仍处于空白状态,这对医学文本信息处理各项任务的进展造成了巨大阻力。如何判断不同的医学实体类别?如何界定不同实体间的涵盖范围?这些问题导致缺乏类似通用场景的大规模规范标注的医学文本数据。针对上述问题,该文参考了UMLS中定义的语义类型,提出面向医学文本信息处理的医学实体标注规范,涵盖了疾病、临床表现、医疗程序、医疗设备等9种医学实体,以及基于规范构建医学实体标注语料库。该文综述了标注规范的描述体系、分类原则、混淆处理、语料标注过程以及医学实体自动标注基线实验等相关问题,希望能为医学实体语料库的构建提供可参考的标注规范,以及为医学实体识别提供语料支持。

pdf
Konwledge-Enabled Diagnosis Assistant Based on Obstetric EMRs and Knowledge Graph
Kunli Zhang | Xu Zhao | Lei Zhuang | Qi Xie | Hongying Zan
Proceedings of the 19th Chinese National Conference on Computational Linguistics

The obstetric Electronic Medical Record (EMR) contains a large amount of medical data and health information. It plays a vital role in improving the quality of the diagnosis assistant service. In this paper, we treat the diagnosis assistant as a multi-label classification task and propose a Knowledge-Enabled Diagnosis Assistant (KEDA) model for the obstetric diagnosis assistant. We utilize the numerical information in EMRs and the external knowledge from Chinese Obstetric Knowledge Graph (COKG) to enhance the text representation of EMRs. Specifically, the bidirectional maximum matching method and similarity-based approach are used to obtain the entities set contained in EMRs and linked to the COKG. The final knowledge representation is obtained by a weight-based disease prediction algorithm, and it is fused with the text representation through a linear weighting method. Experiment results show that our approach can bring about +3.53 F1 score improvements upon the strong BERT baseline in the diagnosis assistant task.