Ranchi Zhao


2024

pdf
UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs
Chaoqun He | Renjie Luo | Shengding Hu | Ranchi Zhao | Jie Zhou | Hanghao Wu | Jiajie Zhang | Xu Han | Zhiyuan Liu | Maosong Sun
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Evaluation is pivotal for honing Large Language Models (LLMs), pinpointing their capabilities and guiding enhancements. The rapid development of LLMs calls for a lightweight and easy-to-use framework for swift evaluation deployment. However, due to the various implementation details to consider, developing a comprehensive evaluation platform is never easy. Existing platforms are often complex and poorly modularized, hindering seamless incorporation into researcher’s workflows. This paper introduces UltraEval, a user-friendly evaluation framework characterized by lightweight, comprehensiveness, modularity, and efficiency. We identify and reimplement three core components of model evaluation (models, data, and metrics). The resulting composability allows for the free combination of different models, tasks, prompts, and metrics within a unified evaluation workflow. Additionally, UltraEval supports diverse models owing to a unified HTTP service and provides sufficient inference acceleration.

pdf
LLMBox: A Comprehensive Library for Large Language Models
Tianyi Tang | Hu Yiwen | Bingqian Li | Wenyang Luo | ZiJing Qin | Haoxiang Sun | Jiapeng Wang | Shiyi Xu | Xiaoxue Cheng | Geyang Guo | Han Peng | Bowen Zheng | Yiru Tang | Yingqian Min | Yushuo Chen | Jie Chen | Ranchi Zhao | Luran Ding | Yuhao Wang | Zican Dong | Xia Chunxuan | Junyi Li | Kun Zhou | Xin Zhao | Ji-Rong Wen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency. With our library, users can easily reproduce existing methods, train new models, and conduct comprehensive performance comparisons. To rigorously test LLMBox, we conduct extensive experiments in a diverse coverage of evaluation settings, and experimental results demonstrate the effectiveness and efficiency of our library in supporting various implementations related to LLMs. The detailed introduction and usage guidance can be found at https://github.com/RUCAIBox/LLMBox.

pdf
DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing with Language Models
Ranchi Zhao | Zhen Leng Thai | Yifan Zhang | Shengding Hu | Jie Zhou | Yunqi Ba | Jie Cai | Zhiyuan Liu | Maosong Sun
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The performance of Large Language Models (LLMs) is substantially influenced by the pretraining corpus, which consists of vast quantities of unsupervised data processed by the models. Despite its critical role in model performance, ensuring the quality of this data is challenging due to its sheer volume and the absence of sample-level quality annotations and enhancements. In this paper, we introduce DecorateLM, a data engineering method designed to refine the pretraining corpus through data rating, tagging and editing. Specifically, DecorateLM rates texts against quality criteria, tags texts with hierarchical labels, and edits texts into a more formalized format. Due to the massive size of the pretraining corpus, adopting an LLM for decorating the entire corpus is less efficient. Therefore, to balance performance with efficiency, we curate a meticulously annotated training corpus for DecorateLM using a large language model and distill data engineering expertise into a compact 1.2 billion parameter small language model (SLM). We then apply DecorateLM to enhance 100 billion tokens of the training corpus, selecting 45 billion tokens that exemplify high quality and diversity for the further training of another 1.2 billion parameter LLM. Our results demonstrate that employing such high-quality data can significantly boost model performance, showcasing a powerful approach to enhance the quality of the pretraining corpus.