Dong Huang
2025
FoldMoE: Efficient Long Sequence MoE Training via Attention-MoE Pipelining
Guichao Zhu
|
Lintian Lei
|
Yuhao Qing
|
Yichao Fu
|
Fanxin Li
|
Dong Huang
|
Zekai Sun
|
Heming Cui
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Training LLMs with Mixture-of-Experts (MoE) architecture on long sequences poses significant challenges due to the all-to-all communication bottleneck of expert parallelism. While existing approaches attempt to hide the communication costs in computation through token-level pipelining within MoE layers, their effectiveness is limited by the insufficient computation. We present FoldMoE, a high-performance MoE training system that enables token-level overlapping across entire Transformer blocks through novel attention-MoE pipelining. We propose an efficient pipeline schedule, and a novel token buffering design to decouple attention and MoE layer partitioning, along with a time-uniform micro-batching strategy for enhanced efficiency. Evaluations on GPT-MoE models with sequences up to 32K tokens show that FoldMoE achieves up to 1.49x and 2.72x speedup over state-of-the-art token-level overlapping and non-overlapping baselines respectively.
CodeArena: A Collective Evaluation Platform for LLM Code Generation
Mingzhe Du
|
Anh Tuan Luu
|
Bin Ji
|
Xiaobao Wu
|
Yuhao Qing
|
Dong Huang
|
Terry Yue Zhuo
|
Qian Liu
|
See-Kiong Ng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Large Language Models (LLMs) have reshaped code generation by synergizing their exceptional comprehension of natural language and programming syntax, thereby substantially boosting developer productivity. These advancements have prompted numerous efforts to quantitatively evaluate their coding capabilities. However, persistent challenges, such as benchmark leakage, data dissipation, and limited system accessibility, continue to impede a timely and accurate assessment. To address these limitations, we introduce CodeArena, an online evaluation framework tailored for LLM code generation. Its key innovation is a collective evaluation mechanism, which dynamically recalibrates individual model scores based on the holistic performance of all participating models, mitigating score biases caused by widespread benchmark leakage. In addition, CodeArena ensures open access to all submitted solutions and test cases and provides automation-friendly APIs to streamline the code evaluation workflow. Our main contributions are: (1) a collective evaluation system for unbiased assessment, (2) a public repository of solutions and test cases, and (3) automation-ready APIs for seamless integration.
Search
Fix author
Co-authors
- Yuhao Qing 2
- Heming Cui 1
- Mingzhe Du 1
- Yichao Fu 1
- Bin Ji 1
- show all...
Venues
- acl2