Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts
Yang Liu, Yuanhe Tian, Tsung-Hui Chang, Song Wu, Xiang Wan, Yan Song
Abstract
Chinese word segmentation (CWS) and medical concept recognition are two fundamental tasks to process Chinese electronic medical records (EMRs) and play important roles in downstream tasks for understanding Chinese EMRs. One challenge to these tasks is the lack of medical domain datasets with high-quality annotations, especially medical-related tags that reveal the characteristics of Chinese EMRs. In this paper, we collected a Chinese EMR corpus, namely, ACEMR, with human annotations for Chinese word segmentation and EMR-related tags. On the ACEMR corpus, we run well-known models (i.e., BiLSTM, BERT, and ZEN) and existing state-of-the-art systems (e.g., WMSeg and TwASP) for CWS and medical concept recognition. Experimental results demonstrate the necessity of building a dedicated medical dataset and show that models that leverage extra resources achieve the best performance for both tasks, which provides certain guidance for future studies on model selection in the medical domain.- Anthology ID:
- 2021.bionlp-1.23
- Volume:
- Proceedings of the 20th Workshop on Biomedical Language Processing
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 213–220
- Language:
- URL:
- https://aclanthology.org/2021.bionlp-1.23
- DOI:
- 10.18653/v1/2021.bionlp-1.23
- Cite (ACL):
- Yang Liu, Yuanhe Tian, Tsung-Hui Chang, Song Wu, Xiang Wan, and Yan Song. 2021. Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 213–220, Online. Association for Computational Linguistics.
- Cite (Informal):
- Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts (Liu et al., BioNLP 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.bionlp-1.23.pdf
- Code
- cuhksz-nlp/acemr