Zhiyin Yu
2026
Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning
Zhiyin Yu | Bo Zhang | Qibin Hou | Zhonghai Wu | Xiao Luo | Lei Bai
Findings of the Association for Computational Linguistics: ACL 2026
Zhiyin Yu | Bo Zhang | Qibin Hou | Zhonghai Wu | Xiao Luo | Lei Bai
Findings of the Association for Computational Linguistics: ACL 2026
Previous LLMs-based RL studies typically follow either supervised learning with high annotation costs, or unsupervised paradigms using voting or entropy-based rewards. However, their performance remains far from satisfactory due to the substantial annotation cost and issues such as model collapse or reward hacking. To address these issues, we introduce a new perspective inspired by cognitive learning theory and propose a novel approach called EasyRL. The core of EasyRL is to simulate the human cognitive acquisition curve by integrating reliable knowledge transfer from easy labeled data with a progressive divide-and-conquer strategy that tackles increasingly difficult unlabeled data. Specifically, we initialize a warm-up model using supervised RL with few-shot labeled data. This is followed by a divide-and-conquer pseudo-labeling strategy on difficult unlabeled data, combining consistency-based selection for low-uncertainty cases and reflection-based resolution for medium-uncertainty cases. Finally, difficulty-progressive self-training with iterative pseudo-labeling and RL further strengthens the model’s reasoning capability. EasyRL provides a unified self-evolving framework that facilitates data-efficient post-training of LLMs. Experimental results on mathematical and scientific benchmarks demonstrate that EasyRL, using only 10% of easy labeled data, consistently outperforms state-of-the-art baselines.
A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions
Zhiyin Yu | Yuchen Mou | Juncheng Yan | Junyu Luo | Chunchun Chen | Xing Wei | Yunhui Liu | Hongru Sun | Yuxing Zhang | Jun Xu | Yatao Bian | Ming Zhang | Wei Ye | Tieke He | Jie Yang | Guanjie Zheng | Zhonghai Wu | Bo Zhang | Lei Bai | Xiao Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyin Yu | Yuchen Mou | Juncheng Yan | Junyu Luo | Chunchun Chen | Xing Wei | Yunhui Liu | Hongru Sun | Yuxing Zhang | Jun Xu | Yatao Bian | Ming Zhang | Wei Ye | Tieke He | Jie Yang | Guanjie Zheng | Zhonghai Wu | Bo Zhang | Lei Bai | Xiao Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.
2025
scRAG: Hybrid Retrieval-Augmented Generation for LLM-based Cross-Tissue Single-Cell Annotation
Zhiyin Yu | Chao Zheng | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: ACL 2025
Zhiyin Yu | Chao Zheng | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: ACL 2025
In recent years, large language models (LLMs) such as GPT-4 have demonstrated impressive potential in a wide range of fields, including biology, genomics and healthcare. Numerous studies have attempted to apply pre-trained LLMs to single-cell data analysis within one tissue. However, when it comes to cross-tissue cell annotation, LLMs often suffer from unsatisfactory performance due to the lack of specialized biological knowledge regarding genes and tissues. In this paper, we introduce scRAG, a novel framework that incorporates advanced LLM-based RAG techniques into cross-tissue single-cell annotation. scRAG utilizes LLMs to retrieve structured triples from knowledge graphs and unstructured similar cell information from the reference cell database, and it generates candidate cell types. The framework further optimizes predictions by retrieving marker genes from both candidate cells and similar cells to refine its results. Extensive experiments on a cross-tissue dataset demonstrate that our scRAG framework outperforms various baselines, including generalist models, domain-specific methods, and trained classifiers. The source code is available at https://github.com/YuZhiyin/scRAG.
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study
Bowen Li | Wenhan Wu | Ziwei Tang | Lin Shi | John Yang | Jinyang Li | Shunyu Yao | Chen Qian | Binyuan Hui | Qicheng Zhang | Zhiyin Yu | He Du | Ping Yang | Dahua Lin | Chao Peng | Kai Chen
Proceedings of the 31st International Conference on Computational Linguistics
Bowen Li | Wenhan Wu | Ziwei Tang | Lin Shi | John Yang | Jinyang Li | Shunyu Yao | Chen Qian | Binyuan Hui | Qicheng Zhang | Zhiyin Yu | He Du | Ping Yang | Dahua Lin | Chao Peng | Kai Chen
Proceedings of the 31st International Conference on Computational Linguistics
Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. In this case study, we explore the performance of LLMs across the entire software development lifecycle with DevEval, encompassing stages including software design, environment setup, implementation, acceptance testing, and unit testing. DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task. Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.
Search
Fix author
Co-authors
- Xiao Luo 3
- Lei Bai 2
- Zhonghai Wu 2
- Bo Zhang 2
- Yatao Bian 1
- Chong Chen 1
- Chunchun Chen 1
- Kai Chen 1
- He Du 1
- Tieke He 1
- Qibin Hou 1
- Xian-Sheng Hua 1
- Binyuan Hui 1
- Bowen Li 1
- Jinyang Li 1
- Dahua Lin 1
- Yunhui Liu 1
- Junyu Luo 1
- Yuchen Mou 1
- Chao Peng 1
- Chen Qian 1
- Lin Shi 1
- Hongru Sun 1
- Ziwei Tang 1
- Xing Wei 1
- Wenhan Wu 1
- Jun Xu 1
- Juncheng Yan 1
- Jie Yang 1
- John Yang 1
- Ping Yang 1
- Shunyu Yao 1
- Wei Ye 1
- Ming Zhang 1
- Qicheng Zhang 1
- Yuxing Zhang 1
- Chao Zheng 1
- Guanjie Zheng 1