RoBERTa-based Traditional Chinese Medicine Named Entity Recognition Model

Ming-Hsiang Su, Chin-Wei Lee, Chi-Lun Hsu, Ruei-Cyuan Su


Abstract
In this study, a named entity recognition was constructed and applied to the identification of Chinese medicine names and disease names. The results can be further used in a human-machine dialogue system to provide people with correct Chinese medicine medication reminders. First, this study uses web crawlers to sort out web resources into a Chinese medicine named entity corpus, collecting 1097 articles, 1412 disease names and 38714 Chinese medicine names. Then, we annotated each article using TCM name and BIO tagging method. Finally, this study trains and evaluates BERT, ALBERT, RoBERTa, GPT2 with BiLSTM and CRF. The experimental results show that RoBERTa’s NER system combining BiLSTM and CRF achieves the best system performance, with a precision rate of 0.96, a recall rate of 0.96, and an F1-score of 0.96.
Anthology ID:
2022.rocling-1.8
Volume:
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
61–66
Language:
Chinese
URL:
https://aclanthology.org/2022.rocling-1.8
DOI:
Bibkey:
Cite (ACL):
Ming-Hsiang Su, Chin-Wei Lee, Chi-Lun Hsu, and Ruei-Cyuan Su. 2022. RoBERTa-based Traditional Chinese Medicine Named Entity Recognition Model. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 61–66, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
RoBERTa-based Traditional Chinese Medicine Named Entity Recognition Model (Su et al., ROCLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.rocling-1.8.pdf