Yan Shu
2026
Scaling is Not All You Need: Clinical-Oriented Reinforcement Learning Makes Parameter-Efficient Clinical Reasoning
Chi Liu | Yan Shu | Mengzhuo Chen | Hongming Piao | Zhijian Duan | Derek Li | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
Chi Liu | Yan Shu | Mengzhuo Chen | Hongming Piao | Zhijian Duan | Derek Li | Bryan Dai
Findings of the Association for Computational Linguistics: ACL 2026
While large language models show promise in medical applications, achieving expert-level clinical reasoning efficiently remains challenging due to the need for massive amounts of manually labeled data and large-scale models. To address this challenge, we propose Clinical-Oriented Reinforcement Learning (CORL), the first fully open-source, end-to-end reinforcement learning training pipeline in the clinical reasoning domain, incorporating a Reasoning-Oriented Data Strategy (RODS) based on topological synthesis, CoT cold-start, and two-stage reinforcement learning. Through CORL, we trained the Fleming-R1 series of models. Among them, Fleming-R1-7B significantly outperforms models of comparable size while approaching or even surpassing certain 32B and 72B models. Fleming-R1-32B achieves near-parity with GPT-4o and outperforms the strongest open-source alternatives up to 671B in MedXpertQA. This demonstrates that in clinical reasoning field, a meticulously designed training pipeline holds greater importance than scaling model size alone. Data and Models are available at https://github.com/UbiquantAI/Fleming-R1 and https://huggingface.co/collections/IQuestLab/fleming.
2017
Deep Automated Multi-task Learning
Davis Liang | Yan Shu
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Davis Liang | Yan Shu
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Multi-task learning (MTL) has recently contributed to learning better representations in service of various NLP tasks. MTL aims at improving the performance of a primary task by jointly training on a secondary task. This paper introduces automated tasks, which exploit the sequential nature of the input data, as secondary tasks in an MTL model. We explore next word prediction, next character prediction, and missing word completion as potential automated tasks. Our results show that training on a primary task in parallel with a secondary automated task improves both the convergence speed and accuracy for the primary task. We suggest two methods for augmenting an existing network with automated tasks and establish better performance in topic prediction, sentiment analysis, and hashtag recommendation. Finally, we show that the MTL models can perform well on datasets that are small and colloquial by nature.