Tzu-Mi Lin

Also published as: Tzu-mi Lin

2026

Aspect-Based Sentiment Analysis (ABSA) focuses on extracting sentiment at a fine-grained aspect level and has been widely applied across real-world domains. However, existing ABSA research relies on coarse-grained categorical labels (e.g., positive, negative), which limits its ability to capture nuanced affective states. To address this limitation, we adopt a dimensional approach that represents sentiment with continuous valence–arousal (VA) scores, enabling fine-grained analysis at both the aspect and sentiment levels. To this end, we introduce DimABSA, the first multilingual, dimensional ABSA resource annotated with both traditional ABSA elements (aspect terms, aspect categories, and opinion terms) and newly introduced VA scores. This resource contains 76,958 aspect instances across 42,590 sentences, spanning six languages and four domains. We further introduce three subtasks that combine VA scores with different ABSA elements, providing a bridge from traditional ABSA to dimensional ABSA. Given that these subtasks involve both categorical and continuous outputs, we propose a new unified metric, continuous F1 (cF1), which incorporates VA prediction error into standard F1. We provide a comprehensive benchmark using both prompted and fine-tuned large language models across all subtasks. Our results show that DimABSA is a challenging benchmark and provides a foundation for advancing multilingual dimensional ABSA. We publicly released the DimABSA dataset, which was used for Track A of SemEval-2026 Task 3, attracting over 300 participants.

2025

pdf bib abs

ROCLING-2025 Shared Task: Chinese Dimensional Sentiment Analysis for Medical Self-Reflection Texts
Lung-Hao Lee | Tzu-Mi Lin | Hsiu-Min Shih | Kuo-Kai Shyu | Anna S. Hsu | Peih-Ying Lu
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

This paper describes the ROCLING-2025 shared task aimed at Chinese dimensional sentiment analysis for medical self-refection texts, including task organization, data preparation, performance metrics, and evaluation results. A total of six participating teams submitted results for techniques developed for valence-arousal intensity prediction. All datasets with gold standards and evaluation scripts used in this shared task are publicly available online for further research.

2024

pdf bib abs

NYCU-NLP at EXALT 2024: Assembling Large Language Models for Cross-Lingual Emotion and Trigger Detection
Tzu-Mi Lin | Zhe-Yu Xu | Jian-Yu Zhou | Lung-Hao Lee
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

This study describes the model design of the NYCU-NLP system for the EXALT shared task at the WASSA 2024 workshop. We instruction-tune several large language models and then assemble various model combinations as our main system architecture for cross-lingual emotion and trigger detection in tweets. Experimental results showed that our best performing submission is an assembly of the Starling (7B) and Llama 3 (8B) models. Our submission was ranked sixth of 17 participating systems for the emotion detection subtask, and fifth of 7 systems for the binary trigger detection subtask.

pdf bib abs

NYCU-NLP at SemEval-2024 Task 2: Aggregating Large Language Models in Biomedical Natural Language Inference for Clinical Trials
Lung-hao Lee | Chen-ya Chiou | Tzu-mi Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This study describes the model design of the NYCU-NLP system for the SemEval-2024 Task 2 that focuses on natural language inference for clinical trials. We aggregate several large language models to determine the inference relation (i.e., entailment or contradiction) between clinical trial reports and statements that may be manipulated with designed interventions to investigate the faithfulness and consistency of the developed models. First, we use ChatGPT v3.5 to augment original statements in training data and then fine-tune the SOLAR model with all augmented data. During the testing inference phase, we fine-tune the OpenChat model to reduce the influence of interventions and fed a cleaned statement into the fine-tuned SOLAR model for label prediction. Our submission produced a faithfulness score of 0.9236, ranking second of 32 participating teams, and ranked first for consistency with a score of 0.8092.

2023

pdf bib abs

NCUEE-NLP at WASSA 2023 Shared Task 1: Empathy and Emotion Prediction Using Sentiment-Enhanced RoBERTa Transformers
Tzu-Mi Lin | Jung-Ying Chang | Lung-Hao Lee
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

This paper describes our proposed system design for the WASSA 2023 shared task 1. We propose a unified architecture of ensemble neural networks to integrate the original RoBERTa transformer with two sentiment-enhanced RoBERTa-Twitter and EmoBERTa models. For Track 1 at the speech-turn level, our best submission achieved an average Pearson correlation score of 0.7236, ranking fourth for empathy, emotion polarity and emotion intensity prediction. For Track 2 at the essay-level, our best submission obtained an average Pearson correlation score of 0.4178 for predicting empathy and distress scores, ranked first among all nine submissions.

pdf bib

Overview of the ROCLING 2023 Shared Task for Chinese Multi-genre Named Entity Recognition in the Healthcare Domain
Lung-Hao Lee | Tzu-Mi Lin | Chao-Yi Chen
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

2022

pdf bib abs

NCUEE-NLP@SMM4H’22: Classification of Self-reported Chronic Stress on Twitter Using Ensemble Pre-trained Transformer Models
Tzu-Mi Lin | Chao-Yi Chen | Yu-Wen Tzeng | Lung-Hao Lee
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This study describes our proposed system design for the SMM4H 2022 Task 8. We fine-tune the BERT, RoBERTa, ALBERT, XLNet and ELECTRA transformers and their connecting classifiers. Each transformer model is regarded as a standalone method to detect tweets that self-reported chronic stress. The final output classification result is then combined using the majority voting ensemble mechanism. Experimental results indicate that our approach achieved a best F1-score of 0.73 over the positive class.

pdf bib abs

NCUEE-NLP at SemEval-2022 Task 11: Chinese Named Entity Recognition Using the BERT-BiLSTM-CRF Model
Lung-Hao Lee | Chien-Huan Lu | Tzu-Mi Lin
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This study describes the model design of the NCUEE-NLP system for the Chinese track of the SemEval-2022 MultiCoNER task. We use the BERT embedding for character representation and train the BiLSTM-CRF model to recognize complex named entities. A total of 21 teams participated in this track, with each team allowed a maximum of six submissions. Our best submission, with a macro-averaging F1-score of 0.7418, ranked the seventh position out of 21 teams.

Venues

WS1

Fix author

Tzu-Mi Lin

2026

2025

2024

2023

2022

Co-authors

Venues