Shiquan Wang

Also published as: 王, 士权, 士权


2026

This paper presents our top-ranking system for SemEval-2026 Task 13 on code generation detection under multi-lingual and distribution-shift settings. Our approach achieved 1st place in Subtasks A and B, and 2nd place in Subtask C in the official evaluation.Our framework integrates data-centric analysis, full-parameter model adaptation, and multi-level ensemble learning. We first analyze label and length distributions and apply repeated oversampling to address class imbalance. We then optimize prompts in a data-driven manner to improve inference stability. Based on Qwen3-30B-A3B-Instruct, we conduct full-parameter fine-tuning with diverse training configurations and integrate multiple checkpoints using soft voting, hard voting, logits-based voting, and LightGBM stacking.Experimental results demonstrate substantial improvements over zero-shot baselines and consistent gains from ensemble strategies, validating the effectiveness of systematic adaptation and ensembling for robust code generation detection.
This paper describes TeleAI’s system for SemEval-2026 Task 3, Track A, Subtask 1 (DimASR), which focuses on predicting continuous Valence-Arousal (VA) scores for specific aspects in text. We frame this task as an end-to-end regression problem and propose a robust framework utilizing Qwen2.5-7B as the feature extraction backbone, combined with parameter-efficient fine-tuning via LoRA. To enhance model generalization and mitigate domain shifts, we primarily leverage multilingual and multi-domain mixed training. Furthermore, our system integrates several optimization and robustness techniques to stabilize continuous score prediction, including R-Drop-style consistency regularization, embedding-level PGD adversarial training, Smooth L1 (Huber) loss, sigmoid-based output interval mapping, and post-hoc linear calibration. Our comprehensive ablations demonstrate that the combination of joint training and robustness regularizations substantially reduces the official evaluation metric, $RMSE{VA}$. The proposed system achieves highly competitive performance across multiple language and domain settings, demonstrating the efficacy of applying lightweight LLM adaptation for dimensional aspect-based sentiment analysis.
This paper describes our framework for SemEval-2026 Task 6 (CLARITY - Unmasking Political Question Evasions), which focuses on classifying clarity and fine-grained evasion types in political question-answering dialogues. We propose CAMSR-CoT, a confidence-aware multi-stage reasoning framework that unifies the two subtasks through hierarchical label modeling. The framework adopts a confidence-based routing strategy: high-certainty cases are directly resolved, while ambiguous samples are routed to deeper Chain-of-Thought reasoning stages with boundary-aware few-shot exemplars to mitigate label confusion. On the development set, our framework achieves Macro-F1 scores of 0.812 on SubTask 1 and 0.617 on SubTask 2. On the official hidden test set, it ranks 1st in both SubTask 1 (Macro-F1 = 0.89) and SubTask 2 (Macro-F1 = 0.68).
This paper presents a unified, task-adaptive modeling framework for the two tracks of SemEval-2026 Task 4: Narrative Similarity. For Track A, we build a three-stage pipeline of three-dimensional narrative-anchored chain-of-thought (CoT) reasoning, multi-view data augmentation, and Low-Rank Adaptation (LoRA) fine-tuning. For Track B, we design an architecture fully aligned with the ranking inference pipeline and task objective, along with corresponding data augmentation and expansion methods, and propose Smooth Cosine Contrastive Loss (SCCL) to stabilize training in low-resource settings. Systematic experiments verify the effectiveness of each core module, and our final systems rank 4th in both tracks, providing a reproducible technical solution for few-shot similarity modeling.

2025

This paper presents the approach we employed in SemEval-2025 Task 11: “Bridging the Gap in Text-Based Emotion Detection.” The core objective of this shared task is emotion perception, focusing on determining the emotion the speaker is likely expressing when uttering a sentence or short text fragment, as perceived by the majority. In this task, we applied a prompt optimization strategy based on in-context learning, combined with data augmentation and ensemble voting techniques, to significantly enhance the model’s performance. Through these optimizations, the model demonstrated improved accuracy and stability in emotion detection. Ultimately, in both Track A (Multi-label Emotion Detection) and Track B (Emotion Intensity Prediction), our approach achieved top-3 rankings across multiple languages, showcasing the effectiveness and cross-lingual adaptability of our method.

2024

This paper describes the participation of team “TeleAI” in the third International Chinese Ancient Chinese Language Information Processing Evaluation (EvalHan24). The competition comprises a joint task of sentence segmentation and punctuation, categorized into open and closed tracks based on the models and data used. In the final evaluation, our system achieved significantly better results than the baseline. Specifically, in the closed-track sentence segmentation task, we obtained an F1 score of 0.8885, while in the sentence punctuation task, we achieved an F1 score of 0.7129.
“本文描述了队伍“TeleAI”在CCL2024古文历史事件类型抽取评测任务(CHED2024)中提交的参赛系统。该任务旨在自动识别出古代文本中的事件触发词与事件类型,其中事件类型判别被分为粗粒度和细粒度的事件类型判别两部分。为了提高古文历史事件类型抽取的性能,我们结合了大模型和小模型,并采用了半监督自训练的方法。在最终的评估中,我们在触发词识别任务得分0.763,粗粒度事件类型判别任务得分0.842,细粒度事件类型判别任务得分0.779,综合得分0.791,在所有单项任务和综合评分上均排名第一。”
“本技术报告详细介绍了我们团队参加第四届中文空间语义理解评测(SpaCE2024)的方法和成果。SpaCE2024旨在全面测试机器对中文空间语义的理解能力,包括空间信息实体识别、空间信息实体识别、空间信息异常识别、空间方位信息推理和空间异形同义识别五个不同的任务。我们团队采用精心设计的prompt并结合微调的方式激发大语言模型的空间语义理解能力,构建了一个高效的空间语义理解系统。在最终的评估中,我们在空间信息实体识别题目中准确率为0.8947,在空间信息实体识别题目中准确率为0.9364,在空间信息异常识别题目中准确率为0.8480,在空间方位信息推理题目中准确率为0.3471,在空间异形同义识别题目中准确率为0.5631,测试集综合准确率为0.6024,排名第一。”

2023

“本文描述了队伍“翼智团”在CCL23古籍命名实体识别评测中提交的参赛系统。该任务旨在自动识别出古籍文本中人名、书名、官职名等事件基本构成要素的重要实体,并根据使用模型参数是否大于10b分为开放赛道和封闭赛道。该任务中,我们首先利用古籍相关的领域数据和任务数据对开源预训练模型进行持续预训练和微调,显著提升了基座模型在古籍命名实体识别任务上的性能表现。其次提出了一种基于pair-wise投票的不置信实体筛选算法用来得到候选实体,并对候选实体利用上下文增强策略进行实体识别修正。在最终的评估中,我们的系统在封闭赛道中排名第二,F1得分为95.8727。”