Wei Ye
Other people with similar names: Wei Ye
Unverified author pages with similar names: Wei Ye
2026
A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions
Zhiyin Yu | Yuchen Mou | Juncheng Yan | Junyu Luo | Chunchun Chen | Xing Wei | Yunhui Liu | Hongru Sun | Yuxing Zhang | Jun Xu | Yatao Bian | Ming Zhang | Wei Ye | Tieke He | Jie Yang | Guanjie Zheng | Zhonghai Wu | Bo Zhang | Lei Bai | Xiao Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyin Yu | Yuchen Mou | Juncheng Yan | Junyu Luo | Chunchun Chen | Xing Wei | Yunhui Liu | Hongru Sun | Yuxing Zhang | Jun Xu | Yatao Bian | Ming Zhang | Wei Ye | Tieke He | Jie Yang | Guanjie Zheng | Zhonghai Wu | Bo Zhang | Lei Bai | Xiao Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.
2025
MARK: Multi-agent Collaboration with Ranking Guidance for Text-attributed Graph Clustering
Yiwei Fu | Yuxing Zhang | Chunchun Chen | Jianwen Ma | Quan Yuan | Rong-Cheng Tu | Xinli Huang | Wei Ye | Xiao Luo | Minghua Deng
Findings of the Association for Computational Linguistics: ACL 2025
Yiwei Fu | Yuxing Zhang | Chunchun Chen | Jianwen Ma | Quan Yuan | Rong-Cheng Tu | Xinli Huang | Wei Ye | Xiao Luo | Minghua Deng
Findings of the Association for Computational Linguistics: ACL 2025
This paper studies the problem of text-attributed graph clustering, which aims to cluster each node into different groups using both textual attributes and structural information. Although graph neural networks (GNNs) have been proposed to solve this problem, their performance is usually limited when uncertain nodes are near the cluster boundaries due to label scarcity. In this paper, we introduce a new perspective of leveraging large language models (LLMs) to enhance text-attributed graph clustering and develop a novel approach named Multi-agent Collaboration with Ranking Guidance (MARK). The core of our MARK is to generate reliable guidance using the collaboration of three LLM-based agents as ranking-based supervision signals. In particular, we first conduct the coarse graph clustering, and utilize a concept agent to induce the semantics of each cluster. Then, we infer the robustness under perturbations to identify uncertain nodes and use a generation agent to produce synthetic text that closely aligns with their topology. An inference agent is adopted to provide the ranking semantics for each uncertain node in comparison to its synthetic counterpart. The consistent feedback between uncertain and synthetic texts is identified as reliable guidance for fine-tuning the clustering model within a ranking-based supervision objective. Experimental results on various benchmark datasets validate the effectiveness of the proposed MARK compared with competing baselines.