2025
pdf
bib
abs
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang
|
Tianhao Cheng
|
Jason Klein Liu
|
Weidi Xu
|
Jiaran Hao
|
Liuyihan Song
|
Yang Xu
|
Jian Yang
|
Jiaheng Liu
|
Chenchen Zhang
|
Linzheng Chai
|
Ruifeng Yuan
|
Xianzhen Luo
|
Qiufeng Wang
|
YuanTao Fan
|
Qingfu Zhu
|
Zhaoxiang Zhang
|
Yang Gao
|
Jie Fu
|
Qian Liu
|
Houyi Li
|
Ge Zhang
|
Yuan Qi
|
Xu Yinghui
|
Wei Chu
|
Zili Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Code LLMs have been widely used in various domains, including code generation, logical reasoning, and agent systems. However, open-access code LLMs mostly only release weights, lacking key features such as reproducible data pipelines and transparent training protocols, which are crucial for advancing deeper, more reliable investigations. To address the gap, we introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an “open cookbook” for the research community. Unlike most prior efforts, we release not only model weights and inference code, but also the reproducible training data, complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research. Our work identifies the key ingredients for building a top-tier code LLM: optimized heuristic rules for data cleaning and deduplication, effective recall of code-related text corpus, and high-quality synthetic data for both annealing and supervised fine-tuning stages. By offering this level of openness, we aim to broaden access to all aspects of a top-tier code LLM, with OpenCoder serving as both a powerful model and an open foundation to accelerate research and enable reproducible advancements in code intelligence. The released resource is available at https://opencoder-llm.github.io.
2024
pdf
bib
abs
E2-LLM: Efficient and Extreme Length Extension of Large Language Models
Jiaheng Liu
|
ZhiqiBai ZhiqiBai
|
Yuanxing Zhang
|
Chenchen Zhang
|
YuangZh YuangZh
|
Ge Zhang
|
JiakaiWang JiakaiWang
|
Haoran Que
|
Yukang Chen
|
Wenbo Su
|
Tiezheng Ge
|
Jie Fu
|
Wenhu Chen
|
Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2024
Training Large Language Models (LLMs) to process extensive context lengths incurs prohibitive computational costs. Prevailing techniques for extending context capabilities in LLMs typically require not only additional training procedures but also access to datasets with long context (e.g., sequences of 32K tokens), presupposing substantial GPU expenditures. To address the aforementioned issues, we introduce a novel solution named Efficient and Extreme length extension for Large Language Models (E2-LLM). E2-LLM entails a singular training process over considerably short sequences (e.g., 4K tokens), which greatly mitigates the cost of continual-pretraining or fine-tuning. Within the training phase, we incorporate a dual augmentation strategy with Rotary Position Embeddings (RoPE) that adjusts the scale and position indices across distinct training samples. E 2 -LLM is meticulously designed to enhance the model’s robustness to diverse relative positions. The experimental results on multiple benchmark datasets demonstrate the superior performance of E 2 -LLM on demanding tasks of processing long contexts.
pdf
bib
abs
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
Yanan Wu
|
Jie Liu
|
Xingyuan Bu
|
Jiaheng Liu
|
Zhanhui Zhou
|
Yuanxing Zhang
|
Chenchen Zhang
|
ZhiqiBai ZhiqiBai
|
Haibin Chen
|
Tiezheng Ge
|
Wanli Ouyang
|
Wenbo Su
|
Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2024
This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs). Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systemically organizes math problems under a hierarchy of math concepts, so that mathematical reasoning can be evaluated at different granularity with concept-wise accuracies. Based on our ConcepthMath, we then evaluate a broad range of LLMs, and we observe existing LLMs, though achieving high average accuracies on traditional benchmarks, exhibit significant performance variations across different math concepts and may even fail catastrophically on the most basic ones. Besides, we also introduce an efficient fine-tuning strategy to enhance the weaknesses of existing LLMs. Finally, we hope ConceptMath could guide the developers to understand the fine-grained mathematical abilities of their models and facilitate the growth of foundation models. Code is available at https://github.com/conceptmath/conceptmath.