Pusheng Chen
2026
Dawn at SemEval-2026 Task 8: Structured Control Decomposition for Faithful Multi-Turn Retrieval-Augmented Generation
Feiling Li | Xiaoya Qi | Xunyue Wang | Pusheng Chen | Zhiwen Tang | Han Yang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Feiling Li | Xiaoya Qi | Xunyue Wang | Pusheng Chen | Zhiwen Tang | Han Yang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Multi-turn Retrieval-Augmented Generation faces structural challenges that go beyond single-turn retrieval and fusion. Context-dependent queries, cross-turn evidence accumulation, and uncertain answerability jointly affect retrieval quality and generation reliability. We propose a structured control framework that formulates multi-turn RAG as a regulated reasoning process rather than a loosely coupled pipeline. The system first performs evidence and context structuring, extracting atomic facts strictly grounded in reference passages while reconstructing a self-contained query from dialogue history. It then conducts decision-conditioned generation, where explicit control signals regarding question intent, dialogue dependency, and answerability govern response feasibility, scope, and organization. By separating structural decision making from surface realization, the framework enforces consistent information flow across stages and reduces hallucination.Experiments on SemEval-2026 Task 8 show that our approach achieves strong faithfulness and stable overall performance, ranking 17/26 on Task B (generation, H=0.6333).
2025
System Report for CCL25-Eval Task 8: ClinSplitFT: Enhancing ICD Coding in Chinese EMRs with Prompt Engineering and Candidate Set Splitting
Pusheng Chen | Qiangyu Tan | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Pusheng Chen | Qiangyu Tan | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"CCL25-Eval Task 8 focuses on ICD coding from clinical narratives. The challenge of this task lies in the imbalanced and complex label space, with primary diagnoses having a small, focused set of labels and secondary diagnoses involving a much larger, intricate set. To address these challenges, we propose ClinSplitFT (Clinical Code Split Fine-Tuning), a novel framework that enhances ICD coding accuracy using large language models (LLMs). The key innovation of ClinSplitFT is its candidate set split strategy, which splits the full candidate set into several manageable subsets and fine-tunes the model separately on each. During inference, predictions from all subsets are aggregated to produce the final output. This split-based fine-tuning approach enables more focused learning and better generalization in multi-label settings, making it an effective solution for clinical code prediction at scale. Experimental results show significant improvements in ICD coding performance. The code for our system is publicly available at https://github.com/277CPS/ICD-Code-prediction."