2021
pdf
abs
Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions
Po-Han Chen
|
Yu-Xiang Zeng
|
Lung-Hao Lee
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
In this paper, we propose a knowledge infusion mechanism to incorporate domain knowledge into language transformers. Weakly supervised data is regarded as the main source for knowledge acquisition. We pre-train the language models to capture masked knowledge of focuses and aspects and then fine-tune them to obtain better performance on the downstream tasks. Due to the lack of publicly available datasets for multi-label classification of Chinese medical questions, we crawled questions from medical question/answer forums and manually annotated them using eight predefined classes: persons and organizations, symptom, cause, examination, disease, information, ingredient, and treatment. Finally, a total of 1,814 questions with 2,340 labels. Each question contains an average of 1.29 labels. We used Baidu Medical Encyclopedia as the knowledge resource. Two transformers BERT and RoBERTa were implemented to compare performance on our constructed datasets. Experimental results showed that our proposed model with knowledge infusion mechanism can achieve better performance, no matter which evaluation metric including Macro F1, Micro F1, Weighted F1 or Subset Accuracy were considered.
pdf
abs
NCUEE-NLP at MEDIQA 2021: Health Question Summarization Using PEGASUS Transformers
Lung-Hao Lee
|
Po-Han Chen
|
Yu-Xiang Zeng
|
Po-Lei Lee
|
Kuo-Kai Shyu
Proceedings of the 20th Workshop on Biomedical Language Processing
This study describes the model design of the NCUEE-NLP system for the MEDIQA challenge at the BioNLP 2021 workshop. We use the PEGASUS transformers and fine-tune the downstream summarization task using our collected and processed datasets. A total of 22 teams participated in the consumer health question summarization task of MEDIQA 2021. Each participating team was allowed to submit a maximum of ten runs. Our best submission, achieving a ROUGE2-F1 score of 0.1597, ranked third among all 128 submissions.
2020
pdf
abs
Medication Mention Detection in Tweets Using ELECTRA Transformers and Decision Trees
Lung-Hao Lee
|
Po-Han Chen
|
Hao-Chuan Kao
|
Ting-Chun Hung
|
Po-Lei Lee
|
Kuo-Kai Shyu
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
This study describes our proposed model design for the SMM4H 2020 Task 1. We fine-tune ELECTRA transformers using our trained SVM filter for data augmentation, along with decision trees to detect medication mentions in tweets. Our best F1-score of 0.7578 exceeded the mean score 0.6646 of all 15 submitting teams.
2019
pdf
abs
NCUEE at MEDIQA 2019: Medical Text Inference Using Ensemble BERT-BiLSTM-Attention Model
Lung-Hao Lee
|
Yi Lu
|
Po-Han Chen
|
Po-Lei Lee
|
Kuo-Kai Shyu
Proceedings of the 18th BioNLP Workshop and Shared Task
This study describes the model design of the NCUEE system for the MEDIQA challenge at the ACL-BioNLP 2019 workshop. We use the BERT (Bidirectional Encoder Representations from Transformers) as the word embedding method to integrate the BiLSTM (Bidirectional Long Short-Term Memory) network with an attention mechanism for medical text inferences. A total of 42 teams participated in natural language inference task at MEDIQA 2019. Our best accuracy score of 0.84 ranked the top-third among all submissions in the leaderboard.