Hong-Jie Dai


2020

pdf bib
Cancer Registry Information Extraction via Transfer Learning
Yan-Jie Lin | Hong-Jie Dai | You-Chen Zhang | Chung-Yang Wu | Yu-Cheng Chang | Pin-Jou Lu | Chih-Jen Huang | Yu-Tsang Wang | Hui-Min Hsieh | Kun-San Chao | Tsang-Wu Liu | I-Shou Chang | Yi-Hsin Connie Yang | Ti-Hao Wang | Ko-Jiunn Liu | Li-Tzong Chen | Sheau-Fang Yang
Proceedings of the 3rd Clinical Natural Language Processing Workshop

A cancer registry is a critical and massive database for which various types of domain knowledge are needed and whose maintenance requires labor-intensive data curation. In order to facilitate the curation process for building a high-quality and integrated cancer registry database, we compiled a cross-hospital corpus and applied neural network methods to develop a natural language processing system for extracting cancer registry variables buried in unstructured pathology reports. The performance of the developed networks was compared with various baselines using standard micro-precision, recall and F-measure. Furthermore, we conducted experiments to study the feasibility of applying transfer learning to rapidly develop a well-performing system for processing reports from different sources that might be presented in different writing styles and formats. The results demonstrate that the transfer learning method enables us to develop a satisfactory system for a new hospital with only a few annotations and suggest more opportunities to reduce the burden of cancer registry curation.

pdf bib
ISLab System for SMM4H Shared Task 2020
Chen-Kai Wang | Hong-Jie Dai | You-Chen Zhang | Bo-Chun Xu | Bo-Hong Wang | You-Ning Xu | Po-Hao Chen | Chung-Hong Lee
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

In this paper, we described our systems for the first and second subtasks of Social Media Mining for Health Applications (SMM4H) shared task in 2020. The two subtasks are automatic classi-fication of medication mentions and adverse effect in tweets. Our systems for both subtasks are based on Robustly optimized BERT approach (RoBERTa) and our previous work at SMM4H’19. The best F1-scores achieved by our systems for subtask 1 and 2 were 0.7974 and 0.64 respec-tively, which outperformed the average F1-scores among all teams’ best runs by at least 0.13.

2019

pdf bib
BIGODM System in the Social Media Mining for Health Applications Shared Task 2019
Chen-Kai Wang | Hong-Jie Dai | Bo-Hung Wang
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

In this study, we describe our methods to automatically classify Twitter posts conveying events of adverse drug reaction (ADR). Based on our previous experience in tackling the ADR classification task, we empirically applied the vote-based under-sampling ensemble approach along with linear support vector machine (SVM) to develop our classifiers as part of our participation in ACL 2019 Social Media Mining for Health Applications (SMM4H) shared task 1. The best-performed model on the test sets were trained on a merged corpus consisting of the datasets released by SMM4H 2017 and 2019. By using VUE, the corpus was randomly under-sampled with 2:1 ratio between the negative and positive classes to create an ensemble using the linear kernel trained with features including bag-of-word, domain knowledge, negation and word embedding. The best performing model achieved an F-measure of 0.551 which is about 5% higher than the average F-scores of 16 teams.

2017

pdf bib
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)
Jitendra Jonnagaddala | Hong-Jie Dai | Yung-Chun Chang
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

pdf bib
Incorporating Dependency Trees Improve Identification of Pregnant Women on Social Media Platforms
Yi-Jie Huang | Chu Hsien Su | Yi-Chun Chang | Tseng-Hsin Ting | Tzu-Yuan Fu | Rou-Min Wang | Hong-Jie Dai | Yung-Chun Chang | Jitendra Jonnagaddala | Wen-Lian Hsu
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

The increasing popularity of social media lead users to share enormous information on the internet. This information has various application like, it can be used to develop models to understand or predict user behavior on social media platforms. For example, few online retailers have studied the shopping patterns to predict shopper’s pregnancy stage. Another interesting application is to use the social media platforms to analyze users’ health-related information. In this study, we developed a tree kernel-based model to classify tweets conveying pregnancy related information using this corpus. The developed pregnancy classification model achieved an accuracy of 0.847 and an F-score of 0.565. A new corpus from popular social media platform Twitter was developed for the purpose of this study. In future, we would like to improve this corpus by reducing noise such as retweets.

pdf bib
Using a Recurrent Neural Network Model for Classification of Tweets Conveyed Influenza-related Information
Chen-Kai Wang | Onkar Singh | Zhao-Li Tang | Hong-Jie Dai
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

Traditional disease surveillance systems depend on outpatient reporting and virological test results released by hospitals. These data have valid and accurate information about emerging outbreaks but it’s often not timely. In recent years the exponential growth of users getting connected to social media provides immense knowledge about epidemics by sharing related information. Social media can now flag more immediate concerns related to out-breaks in real time. In this paper we apply the long short-term memory recurrent neural net-work (RNN) architecture to classify tweets conveyed influenza-related information and compare its performance with baseline algorithms including support vector machine (SVM), decision tree, naive Bayes, simple logistics, and naive Bayes multinomial. The developed RNN model achieved an F-score of 0.845 on the MedWeb task test set, which outperforms the F-score of SVM without applying the synthetic minority oversampling technique by 0.08. The F-score of the RNN model is within 1% of the highest score achieved by SVM with oversampling technique.

2016

pdf bib
Combining Multiple Classifiers Using Global Ranking for ReachOut.com Post Triage
Chen-Kai Wang | Hong-Jie Dai | Chih-Wei Chen | Jitendra Jonnagaddala | Nai-Wen Chang
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

2015

pdf bib
A preliminary study on automatic identification of patient smoking status in unstructured electronic health records
Jitendra Jonnagaddala | Hong-Jie Dai | Pradeep Ray | Siaw-Teng Liaw
Proceedings of BioNLP 15

pdf bib
TMUNSW: Identification of Disorders and Normalization to SNOMED-CT Terminology in Unstructured Clinical Notes
Jitendra Jonnagaddala | Siaw-Teng Liaw | Pradeep Ray | Manish Kumar | Hong-Jie Dai
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
TMUNSW: Disorder Concept Recognition and Normalization in Clinical Notes for SemEval-2014 Task 7
Jitendra Jonnagaddala | Manish Kumar | Hong-Jie Dai | Enny Rachmani | Chien-Yeh Hsu
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Joint Learning of Entity Linking Constraints Using a Markov-Logic Network
Hong-Jie Dai | Richard Tzong-Han Tsai | Wen-Lian Hsu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 19, Number 1, March 2014

2011

pdf bib
Entity Disambiguation Using a Markov-Logic Network
Hong-Jie Dai | Richard Tzong-Han Tsai | Wen-Lian Hsu
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Global Ranking via Data Fusion
Hong-Jie Dai | Po-Ting Lai | Richard Tzong-Han Tsai | Wen-Lian Hsu
Coling 2010: Posters

2006

pdf bib
On Closed Task of Chinese Word Segmentation: An Improved CRF Model Coupled with Character Clustering and Automatically Generated Template Matching
Richard Tzong-Han Tsai | Hsieh-Chuan Hung | Cheng-Lung Sung | Hong-Jie Dai | Wen-Lian Hsu
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing