Zhiyin Yu
2026
A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions
Zhiyin Yu | Yuchen Mou | Juncheng Yan | Junyu Luo | Chunchun Chen | Xing Wei | Yunhui Liu | Hongru Sun | Yuxing Zhang | Jun Xu | Yatao Bian | Ming Zhang | Wei Ye | Tieke He | Jie Yang | Guanjie Zheng | Zhonghai Wu | Bo Zhang | Lei Bai | Xiao Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyin Yu | Yuchen Mou | Juncheng Yan | Junyu Luo | Chunchun Chen | Xing Wei | Yunhui Liu | Hongru Sun | Yuxing Zhang | Jun Xu | Yatao Bian | Ming Zhang | Wei Ye | Tieke He | Jie Yang | Guanjie Zheng | Zhonghai Wu | Bo Zhang | Lei Bai | Xiao Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.
2025
scRAG: Hybrid Retrieval-Augmented Generation for LLM-based Cross-Tissue Single-Cell Annotation
Zhiyin Yu | Chao Zheng | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: ACL 2025
Zhiyin Yu | Chao Zheng | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: ACL 2025
In recent years, large language models (LLMs) such as GPT-4 have demonstrated impressive potential in a wide range of fields, including biology, genomics and healthcare. Numerous studies have attempted to apply pre-trained LLMs to single-cell data analysis within one tissue. However, when it comes to cross-tissue cell annotation, LLMs often suffer from unsatisfactory performance due to the lack of specialized biological knowledge regarding genes and tissues. In this paper, we introduce scRAG, a novel framework that incorporates advanced LLM-based RAG techniques into cross-tissue single-cell annotation. scRAG utilizes LLMs to retrieve structured triples from knowledge graphs and unstructured similar cell information from the reference cell database, and it generates candidate cell types. The framework further optimizes predictions by retrieving marker genes from both candidate cells and similar cells to refine its results. Extensive experiments on a cross-tissue dataset demonstrate that our scRAG framework outperforms various baselines, including generalist models, domain-specific methods, and trained classifiers. The source code is available at https://github.com/YuZhiyin/scRAG.
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study
Bowen Li | Wenhan Wu | Ziwei Tang | Lin Shi | John Yang | Jinyang Li | Shunyu Yao | Chen Qian | Binyuan Hui | Qicheng Zhang | Zhiyin Yu | He Du | Ping Yang | Dahua Lin | Chao Peng | Kai Chen
Proceedings of the 31st International Conference on Computational Linguistics
Bowen Li | Wenhan Wu | Ziwei Tang | Lin Shi | John Yang | Jinyang Li | Shunyu Yao | Chen Qian | Binyuan Hui | Qicheng Zhang | Zhiyin Yu | He Du | Ping Yang | Dahua Lin | Chao Peng | Kai Chen
Proceedings of the 31st International Conference on Computational Linguistics
Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. In this case study, we explore the performance of LLMs across the entire software development lifecycle with DevEval, encompassing stages including software design, environment setup, implementation, acceptance testing, and unit testing. DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task. Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.
Search
Fix author
Co-authors
- Xiao Luo 2
- Lei Bai 1
- Yatao Bian 1
- Chong Chen 1
- Chunchun Chen 1
- Kai Chen 1
- He Du 1
- Tieke He 1
- Xian-Sheng Hua 1
- Binyuan Hui 1
- Bowen Li 1
- Jinyang Li 1
- Dahua Lin 1
- Yunhui Liu 1
- Junyu Luo 1
- Yuchen Mou 1
- Chao Peng 1
- Chen Qian 1
- Lin Shi 1
- Hongru Sun 1
- Ziwei Tang 1
- Xing Wei 1
- Wenhan Wu 1
- Zhonghai Wu 1
- Jun Xu 1
- Juncheng Yan 1
- Jie Yang 1
- John Yang 1
- Ping Yang 1
- Shunyu Yao 1
- Wei Ye 1
- Bo Zhang 1
- Ming Zhang 1
- Qicheng Zhang 1
- Yuxing Zhang 1
- Chao Zheng 1
- Guanjie Zheng 1