ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Jiasheng Zheng, Xin Zheng, Boxi Cao, Pengbo Wang, Zhengzhao Ma, Qiming Zhu, Jiazhen Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun


Abstract
Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide accurate verification and efficiency under high-concurrency workloads. We present ScaleBox, a high-fidelity and scalable system designed to address these limitations in large-scale code training. ScaleBox introduces automated special-judge generation and management, fine-grained parallel execution across test cases with seamless multi-node coordination, and a configuration-driven evaluation suite for reproducible benchmarking. A series of experiments demonstrates that ScaleBox significantly enhances code verification accuracy and efficiency. Our further RLVR experiments show that ScaleBox substantially improves both performance on LiveCodeBench and training stability, significantly outperforming heuristic-matching baselines. By providing a reliable and high-throughput infrastructure, ScaleBox facilitates more effective research and development in large-scale code training.
Anthology ID:
2026.acl-demo.30
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Greg Durrett, Ping Jian
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
307–317
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-demo.30/
DOI:
Bibkey:
Cite (ACL):
Jiasheng Zheng, Xin Zheng, Boxi Cao, Pengbo Wang, Zhengzhao Ma, Qiming Zhu, Jiazhen Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, and Le Sun. 2026. ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 307–317, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models (Zheng et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-demo.30.pdf