Shih-Hung Wu

2021

pdf abs
CYUT at ROCLING-2021 Shared Task: Based on BERT and MacBERT
Xie-Sheng Hong | Shih-Hung Wu
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

This paper present a description for the ROCLING 2021 shared task in dimensional sentiment analysis for educational texts. We submitted two runs in the final test. Both runs use the standard regression model. The Run1 uses Chinese version of BERT as the base, and in Run2 we use the early version of MacBERT that Chinese version of RoBERTa-like BERT model, RoBERTa-wwm-ext. Using powerful pre-training model of BERT for text embedding to help train the model.

2020

pdf abs
CYUT Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2020 CGED Shared Task
Shih-Hung Wu | Junwei Wang
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

This paper reports our Chinese Grammatical Error Diagnosis system in the NLPTEA-2020 CGED shared task. In 2020, we sent two runs with two approaches. The first one is a combination of conditional random fields (CRF) and a BERT model deep-learning approach. The second one is a BERT model deep-learning approach. The official results shows that our run1 achieved the highest precision rate 0.9875 with the lowest false positive rate 0.0163 on detection, while run2 gives a more balanced performance.

pdf abs
Learning the Human Judgment for the Automatic Evaluation of Chatbot
Shih-Hung Wu | Sheng-Lun Chien
Proceedings of the Twelfth Language Resources and Evaluation Conference

It is hard to evaluate the quality of the generated text by a generative dialogue system. Currently, dialogue evaluation relies on human judges to label the quality of the generated text. It is not a reusable mechanism that can give consistent evaluation for system developers. We believe that it is easier to get consistent results on comparing two generated dialogue by two systems and it is hard to give a consistent quality score on only one system at a time. In this paper, we propose a machine learning approach to reduce the effort of human evaluation by learning the human judgment on comparing two dialogue systems. Training from the human labeling result, the evaluation model learns which generative models is better in each dialog context. Thus, it can be used for system developers to compare the fine-tuned models over and over again without the human labor. In our experiment we find the agreement between the learned model and human judge is 70%. The experiment is conducted on comparing two attention based GRU-RNN generative models.

2019

pdf
基於Seq2Seq模型的中文文法錯誤診斷系統(A Chinese Grammatical Error Diagnosis System Based on Seq2Seq Model)
Jun-Wei Wang | Sheng-Lun Chien | Yi-Kun Chen | Shih-Hung Wu
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2018

pdf abs
A Short Answer Grading System in Chinese by Support Vector Approach
Shih-Hung Wu | Wen-Feng Shih
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

In this paper, we report a short answer grading system in Chinese. We build a system based on standard machine learning approaches and test it with translated corpus from two publicly available corpus in English. The experiment results show similar results on two different corpus as in English.

pdf abs
CYUT-III Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2018 CGED Shared Task
Shih-Hung Wu | Jun-Wei Wang | Liang-Pu Chen | Ping-Che Yang
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

This paper reports how we build a Chinese Grammatical Error Diagnosis system in the NLPTEA-2018 CGED shared task. In 2018, we sent three runs with three different approaches. The first one is a pattern-based approach by frequent error pattern matching. The second one is a sequential labelling approach by conditional random fields (CRF). The third one is a rewriting approach by sequence to sequence (seq2seq) model. The three approaches have different properties that aim to optimize different performance metrics and the formal run results show the differences as we expected.

2017

pdf abs
CYUT at IJCNLP-2017 Task 3: System Report for Review Opinion Diversification
Shih-Hung Wu | Su-Yu Chang | Liang-Pu Chen
Proceedings of the IJCNLP 2017, Shared Tasks

Review Opinion Diversification (RevOpiD) 2017 is a shared task which is held in International Joint Conference on Natural Language Processing (IJCNLP). The shared task aims at selecting top-k reviews, as a summary, from a set of re-views. There are three subtasks in RevOpiD: helpfulness ranking, rep-resentativeness ranking, and ex-haustive coverage ranking. This year, our team submitted runs by three models. We focus on ranking reviews based on the helpfulness of the reviews. In the first two models, we use linear regression with two different loss functions. First one is least squares, and second one is cross entropy. The third run is a random baseline. For both k=5 and k=10, our second model gets the best scores in the official evaluation metrics.

2016

pdf abs
CYUT-III System at Chinese Grammatical Error Diagnosis Task
Po-Lin Chen | Shih-Hung Wu | Liang-Pu Chen | Ping-Che Yang
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

This paper describe the CYUT-III system on grammar error detection in the 2016 NLP-TEA Chinese Grammar Error Detection shared task CGED. In this task a system has to detect four types of errors, in-cluding redundant word error, missing word error, word selection error and word ordering error. Based on the conditional random fields (CRF) model, our system is a linear tagger that can detect the errors in learners’ essays. Since the system performance depends on the features heavily, in this paper, we are going to report how to integrate the collocation feature into the CRF model. Our system presents the best detection accuracy and Identification accuracy on the TOCFL dataset, which is in traditional Chi-nese. The same system also works well on the simplified Chinese HSK dataset.

pdf
以語言模型評估學習者文句修改前後之流暢度(Using language model to assess the fluency of learners sentences edited by teachers)[In Chinese]
Guan-Ying Pu | Po-Lin Chen | Shih-Hung Wu
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)