Yi-Min Jian


2026

This paper describes CYUT’s system for SemEval-2026 Task~3 Track~B, a multilingual aspect-based dimensional sentiment regression task. We formulate the task as continuous Valence–Arousal (VA) prediction and adopt a multi-task learning (MTL) framework with auxiliary tasks automatically derived from gold VA annotations, including polarity, intensity, and quadrant classification. However, these coarse-grained labels may still suffer from regional imbalance in the VA space, leaving some regions with insufficient auxiliary supervision. To address this issue, we extend the system with Polar Multi-Zone Labeling (PMZL) and use its seven-zone variant, PMZL-7. PMZL-7 partitions the VA plane into one core neutral region and six non-central zones based on the directional distribution of non-central samples. This design reduces the risk of auxiliary-label imbalance while supplementing directional information that conventional auxiliary tasks cannot directly capture. We evaluate XLM-R and two generative pretrained models. Results show that PMZL-7 is strongly model-dependent: it provides more stable improvements for Qwen2 and Ministral, while its effect on XLM-R is less consistent. On the official test set, our system achieves the best performance on the NigerianPidgin subset among all participating systems.
This study addresses SemEval-2026 Task 9 on Detecting Multilingual, Multicultural, and Multievent Online Polarization, exploring the performance differences between monolingual and multilingual LoRA (Low-Rank Adaptation) fine-tuning techniques when processing online polarization phenomena. The research points out that online polarization is not only a language phenomenon, but a complex social language problem highly influenced by cultural contexts and event backgrounds. To address the limitation of existing research that only treats polarization as a binary classification, this study participates in three levels of subtasks: Subtask 1: Polarization Detection, Subtask 2: Polarization Type Classification (e.g., politics, religion), and Subtask 3: Manifestation Identification (analyzing rhetorical strategies that construct polarization, such as stereotypes and dehumanization narratives). This study aims to establish a more contextually grounded and diagnostic model analysis framework to enhance the model’s generalization ability and fairness in cross-lingual environments. By exploring different fine-tuning configurations to build a robust ensemble system, the experimental results show that our approach demonstrates exceptional proficiency in the Chinese domain, securing the 1st place ranking in Subtask 1 (Polarization Detection) for Chinese. Furthermore, we observe that while the monolingual LoRA strategy exhibits strong performance in specific languages like Chinese, integrating it with multilingual LoRA models via ensembling provides the diverse features crucial for identifying complex cross-cultural rhetoric.

2025

Accurately modeling physicians’ emotional states from self-reflection texts remains challenging due to the lowresource, domain-specific nature of medical corpora. The proposed workflow performs Retrieval-Augmented Generation (RAG) and multi-teacher pseudo-labeling to generate high-quality augmented data. This workflow enables effective crossdomain adaptation from general text corpora to professional medical texts. Evaluations on the ROCLING 2025 test set demonstrate substantial improvements over the best-performing baseline in Valence–Arousal prediction accuracy and model stability. Importantly, the workflow is domain-agnostic and provides a generalizable methodology for systematically transferring models to new, low-resource domains, making it applicable beyond medical text analysis.