Plug-and-Play Data Module for Code RL: Adaptive Ambiguity Replay
Jianqing Zhang, Wei Xia, Zhezheng Hao, Hong Wang, Hande Dong, Qiang Lin, Yang Liu, Jian Cao, Qiang Yang
Abstract
Reinforcement learning (RL) is effective for improving code generation but suffers from data scarcity. While experience replay mitigates this, existing approaches rely on static, in-epoch metrics that overlook training dynamics, often introducing low-utility or outdated data. Analyzing RL dynamics via dataset cartography, we observe that “ambiguous” samples, which are vital for model generalization, rapidly migrate to “easy-to-learn” regions, diminishing their training value. To address this, we propose Adaptive Ambiguity Replay (A2R) for RL, a plug-and-play module that prioritizes cross-epoch ambiguous samples. To neutralize the noise from stale experiences, A2R incorporates an adaptive importance mechanism based on policy divergence to weigh replayed rollouts. Extensive experiments on nine LLMs (3B–14B) demonstrate that A2R outperforms state-of-the-art baselines on real-world code editing tasks across both unseen and learned domains. Our results highlight cross-epoch ambiguity as a key factor for effective replay in RL. Code: https://github.com/TsingZ0/verl-A2R- Anthology ID:
- 2026.findings-acl.886
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 17865–17875
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.886/
- DOI:
- Cite (ACL):
- Jianqing Zhang, Wei Xia, Zhezheng Hao, Hong Wang, Hande Dong, Qiang Lin, Yang Liu, Jian Cao, and Qiang Yang. 2026. Plug-and-Play Data Module for Code RL: Adaptive Ambiguity Replay. In Findings of the Association for Computational Linguistics: ACL 2026, pages 17865–17875, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Plug-and-Play Data Module for Code RL: Adaptive Ambiguity Replay (Zhang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.886.pdf