2025
pdf
bib
abs
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Qiming Ge
|
Shuhao Xing
|
Songyang Gao
|
Yunhua Zhou
|
Yicheng Zou
|
Songyang Zhang
|
Zhi Chen
|
Hang Yan
|
Qi Zhang
|
Qipeng Guo
|
Kai Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Scaling law builds the relationship between training computation and validation loss, enabling researchers to effectively predict the loss trending of models across different levels of computation. However, a gap still remains between validation loss and the model’s downstream capabilities, making it untrivial to apply scaling law to direct performance prediction for downstream tasks. The loss typically represents a cumulative penalty for predicted tokens, which are implicitly considered to have equal importance. Nevertheless, our studies have shown evidence that when considering different training data distributions, we cannot directly model the relationship between downstream capability and computation or token loss. To bridge the gap between validation loss and downstream task capabilities, in this work, we introduce Capability Salience Vector, which decomposes the overall loss and assigns different importance weights to tokens to assess a specific meta-capability, aligning the validation loss with downstream task performance in terms of the model’s capabilities. Experiments on various popular benchmarks demonstrate that our proposed Capability Salience Vector could significantly improve the predictability of language model performance on downstream tasks.
2024
pdf
bib
abs
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation
Demin Song
|
Honglin Guo
|
Yunhua Zhou
|
Shuhao Xing
|
Yudong Wang
|
Zifan Song
|
Wenwei Zhang
|
Qipeng Guo
|
Hang Yan
|
Xipeng Qiu
|
Dahua Lin
Findings of the Association for Computational Linguistics: ACL 2024
The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs). We examine the impact of pre-training data on code-focused LLMs’ performance by assessing the comment density as a measure of PL-NL alignment. Given the scarcity of code-comment aligned data in pre-training corpora, we introduce a novel data augmentation method that generates comments for existing code, coupled with a data filtering strategy that filters out code data poorly correlated with natural language. We conducted experiments on three code-focused LLMs and observed consistent improvements in performance on two widely-used programming skill benchmarks. Notably, the model trained on the augmented data outperformed both the model used for generating comments and the model further trained on the data without augmentation.
2023
pdf
bib
abs
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
Kai Lv
|
Shuo Zhang
|
Tianle Gu
|
Shuhao Xing
|
Jiawei Hong
|
Keyu Chen
|
Xiaoran Liu
|
Yuqing Yang
|
Honglin Guo
|
Tengxiao Liu
|
Yu Sun
|
Qipeng Guo
|
Hang Yan
|
Xipeng Qiu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Large language models (LLMs) are increasingly pivotal in a wide range of natural language processing tasks. Access to pre-trained models, courtesy of the open-source community, has made it possible to adapt these models to specific applications for enhanced performance. However, the substantial resources required for training these models necessitate efficient solutions. This paper introduces CoLLiE, an efficient library that facilitates collaborative training of large language models using 3D parallelism, parameter-efficient fine-tuning (PEFT) methods, and optimizers such as Lion, Adan, Sophia, and LOMO. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization. CoLLiE has proven superior training efficiency in comparison with prevalent solutions in pre-training and fine-tuning scenarios. Furthermore, we provide an empirical evaluation of the correlation between model size and GPU memory consumption under different optimization methods, as well as an analysis of the throughput. Lastly, we carry out a comprehensive comparison of various optimizers and PEFT methods within the instruction-tuning context. CoLLiE is available at https://github.com/OpenLMLab/collie.