2022
pdf
abs
Enhancing Chinese Multi-Label Text Classification Performance with Response-based Knowledge Distillation
Szu-Chi Huang
|
Cheng-Fu Cao
|
Po-Hsun Liao
|
Lung-Hao Lee
|
Po-Lei Lee
|
Kuo-Kai Shyu
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
It’s difficult to optimize individual label performance of multi-label text classification, especially in those imbalanced data containing long-tailed labels. Therefore, this study proposes a response-based knowledge distillation mechanism comprising a teacher model that optimizes binary classifiers of the corresponding labels and a student model that is a standalone multi-label classifier learning from distilled knowledge passed by the teacher model. A total of 2,724 Chinese healthcare texts were collected and manually annotated across nine defined labels, resulting in 8731 labels, each containing an average of 3.2 labels. We used 5-fold cross-validation to compare the performance of several multi-label models, including TextRNN, TextCNN, HAN, and GRU-att. Experimental results indicate that using the proposed knowledge distillation mechanism effectively improved the performance no matter which model was used, about 2-3% of micro-F1, 4-6% of macro-F1, 3-4% of weighted-F1 and 1-2% of subset accuracy for performance enhancement.
2021
pdf
abs
NCUEE-NLP at MEDIQA 2021: Health Question Summarization Using PEGASUS Transformers
Lung-Hao Lee
|
Po-Han Chen
|
Yu-Xiang Zeng
|
Po-Lei Lee
|
Kuo-Kai Shyu
Proceedings of the 20th Workshop on Biomedical Language Processing
This study describes the model design of the NCUEE-NLP system for the MEDIQA challenge at the BioNLP 2021 workshop. We use the PEGASUS transformers and fine-tune the downstream summarization task using our collected and processed datasets. A total of 22 teams participated in the consumer health question summarization task of MEDIQA 2021. Each participating team was allowed to submit a maximum of ten runs. Our best submission, achieving a ROUGE2-F1 score of 0.1597, ranked third among all 128 submissions.
pdf
abs
Classification of Tweets Self-reporting Adverse Pregnancy Outcomes and Potential COVID-19 Cases Using RoBERTa Transformers
Lung-Hao Lee
|
Man-Chen Hung
|
Chien-Huan Lu
|
Chang-Hao Chen
|
Po-Lei Lee
|
Kuo-Kai Shyu
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
This study describes our proposed model design for SMM4H 2021 shared tasks. We fine-tune the language model of RoBERTa transformers and their connecting classifier to complete the classification tasks of tweets for adverse pregnancy outcomes (Task 4) and potential COVID-19 cases (Task 5). The evaluation metric is F1-score of the positive class for both tasks. For Task 4, our best score of 0.93 exceeded the mean score of 0.925. For Task 5, our best of 0.75 exceeded the mean score of 0.745.
2020
pdf
abs
Medication Mention Detection in Tweets Using ELECTRA Transformers and Decision Trees
Lung-Hao Lee
|
Po-Han Chen
|
Hao-Chuan Kao
|
Ting-Chun Hung
|
Po-Lei Lee
|
Kuo-Kai Shyu
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
This study describes our proposed model design for the SMM4H 2020 Task 1. We fine-tune ELECTRA transformers using our trained SVM filter for data augmentation, along with decision trees to detect medication mentions in tweets. Our best F1-score of 0.7578 exceeded the mean score 0.6646 of all 15 submitting teams.
2019
pdf
abs
NCUEE at MEDIQA 2019: Medical Text Inference Using Ensemble BERT-BiLSTM-Attention Model
Lung-Hao Lee
|
Yi Lu
|
Po-Han Chen
|
Po-Lei Lee
|
Kuo-Kai Shyu
Proceedings of the 18th BioNLP Workshop and Shared Task
This study describes the model design of the NCUEE system for the MEDIQA challenge at the ACL-BioNLP 2019 workshop. We use the BERT (Bidirectional Encoder Representations from Transformers) as the word embedding method to integrate the BiLSTM (Bidirectional Long Short-Term Memory) network with an attention mechanism for medical text inferences. A total of 42 teams participated in natural language inference task at MEDIQA 2019. Our best accuracy score of 0.84 ranked the top-third among all submissions in the leaderboard.