Tzu-Mi Lin
Also published as: Tzu-mi Lin
2026
DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis
Lung-Hao Lee | Liang-Chih Yu | Natalia V Loukachevitch | Ilseyar Alimova | Alexander Panchenko | Tzu-Mi Lin | Zhe-Yu Xu | Jian-Yu Zhou | Guangmin Zheng | Jin Wang | Sharanya Awasthi | Jonas Becker | Jan Philip Wahle | Terry Ruas | Shamsuddeen Hassan Muhammad | Saif M. Mohammad
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lung-Hao Lee | Liang-Chih Yu | Natalia V Loukachevitch | Ilseyar Alimova | Alexander Panchenko | Tzu-Mi Lin | Zhe-Yu Xu | Jian-Yu Zhou | Guangmin Zheng | Jin Wang | Sharanya Awasthi | Jonas Becker | Jan Philip Wahle | Terry Ruas | Shamsuddeen Hassan Muhammad | Saif M. Mohammad
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Aspect-Based Sentiment Analysis (ABSA) focuses on extracting sentiment at a fine-grained aspect level and has been widely applied across real-world domains. However, existing ABSA research relies on coarse-grained categorical labels (e.g., positive, negative), which limits its ability to capture nuanced affective states. To address this limitation, we adopt a dimensional approach that represents sentiment with continuous valence–arousal (VA) scores, enabling fine-grained analysis at both the aspect and sentiment levels. To this end, we introduce DimABSA, the first multilingual, dimensional ABSA resource annotated with both traditional ABSA elements (aspect terms, aspect categories, and opinion terms) and newly introduced VA scores. This resource contains 76,958 aspect instances across 42,590 sentences, spanning six languages and four domains. We further introduce three subtasks that combine VA scores with different ABSA elements, providing a bridge from traditional ABSA to dimensional ABSA. Given that these subtasks involve both categorical and continuous outputs, we propose a new unified metric, continuous F1 (cF1), which incorporates VA prediction error into standard F1. We provide a comprehensive benchmark using both prompted and fine-tuned large language models across all subtasks. Our results show that DimABSA is a challenging benchmark and provides a foundation for advancing multilingual dimensional ABSA. We publicly released the DimABSA dataset, which was used for Track A of SemEval-2026 Task 3, attracting over 300 participants.
2025
ROCLING-2025 Shared Task: Chinese Dimensional Sentiment Analysis for Medical Self-Reflection Texts
Lung-Hao Lee | Tzu-Mi Lin | Hsiu-Min Shih | Kuo-Kai Shyu | Anna S. Hsu | Peih-Ying Lu
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Lung-Hao Lee | Tzu-Mi Lin | Hsiu-Min Shih | Kuo-Kai Shyu | Anna S. Hsu | Peih-Ying Lu
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
This paper describes the ROCLING-2025 shared task aimed at Chinese dimensional sentiment analysis for medical self-refection texts, including task organization, data preparation, performance metrics, and evaluation results. A total of six participating teams submitted results for techniques developed for valence-arousal intensity prediction. All datasets with gold standards and evaluation scripts used in this shared task are publicly available online for further research.
2024
NYCU-NLP at EXALT 2024: Assembling Large Language Models for Cross-Lingual Emotion and Trigger Detection
Tzu-Mi Lin | Zhe-Yu Xu | Jian-Yu Zhou | Lung-Hao Lee
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
Tzu-Mi Lin | Zhe-Yu Xu | Jian-Yu Zhou | Lung-Hao Lee
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
This study describes the model design of the NYCU-NLP system for the EXALT shared task at the WASSA 2024 workshop. We instruction-tune several large language models and then assemble various model combinations as our main system architecture for cross-lingual emotion and trigger detection in tweets. Experimental results showed that our best performing submission is an assembly of the Starling (7B) and Llama 3 (8B) models. Our submission was ranked sixth of 17 participating systems for the emotion detection subtask, and fifth of 7 systems for the binary trigger detection subtask.
NYCU-NLP at SemEval-2024 Task 2: Aggregating Large Language Models in Biomedical Natural Language Inference for Clinical Trials
Lung-hao Lee | Chen-ya Chiou | Tzu-mi Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Lung-hao Lee | Chen-ya Chiou | Tzu-mi Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This study describes the model design of the NYCU-NLP system for the SemEval-2024 Task 2 that focuses on natural language inference for clinical trials. We aggregate several large language models to determine the inference relation (i.e., entailment or contradiction) between clinical trial reports and statements that may be manipulated with designed interventions to investigate the faithfulness and consistency of the developed models. First, we use ChatGPT v3.5 to augment original statements in training data and then fine-tune the SOLAR model with all augmented data. During the testing inference phase, we fine-tune the OpenChat model to reduce the influence of interventions and fed a cleaned statement into the fine-tuned SOLAR model for label prediction. Our submission produced a faithfulness score of 0.9236, ranking second of 32 participating teams, and ranked first for consistency with a score of 0.8092.
2023
NCUEE-NLP at WASSA 2023 Shared Task 1: Empathy and Emotion Prediction Using Sentiment-Enhanced RoBERTa Transformers
Tzu-Mi Lin | Jung-Ying Chang | Lung-Hao Lee
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
Tzu-Mi Lin | Jung-Ying Chang | Lung-Hao Lee
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
This paper describes our proposed system design for the WASSA 2023 shared task 1. We propose a unified architecture of ensemble neural networks to integrate the original RoBERTa transformer with two sentiment-enhanced RoBERTa-Twitter and EmoBERTa models. For Track 1 at the speech-turn level, our best submission achieved an average Pearson correlation score of 0.7236, ranking fourth for empathy, emotion polarity and emotion intensity prediction. For Track 2 at the essay-level, our best submission obtained an average Pearson correlation score of 0.4178 for predicting empathy and distress scores, ranked first among all nine submissions.
Overview of the ROCLING 2023 Shared Task for Chinese Multi-genre Named Entity Recognition in the Healthcare Domain
Lung-Hao Lee | Tzu-Mi Lin | Chao-Yi Chen
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
Lung-Hao Lee | Tzu-Mi Lin | Chao-Yi Chen
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
2022
NCUEE-NLP@SMM4H’22: Classification of Self-reported Chronic Stress on Twitter Using Ensemble Pre-trained Transformer Models
Tzu-Mi Lin | Chao-Yi Chen | Yu-Wen Tzeng | Lung-Hao Lee
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
Tzu-Mi Lin | Chao-Yi Chen | Yu-Wen Tzeng | Lung-Hao Lee
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
This study describes our proposed system design for the SMM4H 2022 Task 8. We fine-tune the BERT, RoBERTa, ALBERT, XLNet and ELECTRA transformers and their connecting classifiers. Each transformer model is regarded as a standalone method to detect tweets that self-reported chronic stress. The final output classification result is then combined using the majority voting ensemble mechanism. Experimental results indicate that our approach achieved a best F1-score of 0.73 over the positive class.
NCUEE-NLP at SemEval-2022 Task 11: Chinese Named Entity Recognition Using the BERT-BiLSTM-CRF Model
Lung-Hao Lee | Chien-Huan Lu | Tzu-Mi Lin
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Lung-Hao Lee | Chien-Huan Lu | Tzu-Mi Lin
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This study describes the model design of the NCUEE-NLP system for the Chinese track of the SemEval-2022 MultiCoNER task. We use the BERT embedding for character representation and train the BiLSTM-CRF model to recognize complex named entities. A total of 21 teams participated in this track, with each team allowed a maximum of six submissions. Our best submission, with a macro-averaging F1-score of 0.7418, ranked the seventh position out of 21 teams.
Search
Fix author
Co-authors
- Lung-Hao Lee 8
- Chao-Yi Chen 2
- Zhe - Yu Xu 2
- Jian-Yu Zhou 2
- Ilseyar Alimova 1
- Sharanya Awasthi 1
- Jonas Becker 1
- Jung-Ying Chang 1
- Chen-ya Chiou 1
- Anna S. Hsu 1
- Natalia V Loukachevitch 1
- Chien-Huan Lu 1
- Peih-Ying Lu 1
- Saif Mohammad 1
- Shamsuddeen Hassan Muhammad 1
- Alexander Panchenko 1
- Terry Ruas 1
- Hsiu-Min Shih 1
- Kuo-Kai Shyu 1
- Yu-Wen Tzeng 1
- Jan Philip Wahle 1
- Jin Wang 1
- Liang-Chih Yu 1
- Guangmin Zheng 1