Xiaohui Su
2023
CLEVA: Chinese Language Models EVAluation Platform
Yanyang Li
|
Jianqiao Zhao
|
Duo Zheng
|
Zi-Yuan Hu
|
Zhi Chen
|
Xiaohui Su
|
Yongfeng Huang
|
Shijia Huang
|
Dahua Lin
|
Michael Lyu
|
Liwei Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model’s capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model’s performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted to holistically evaluate Chinese LLMs. Our platform employs a standardized workflow to assess LLMs’ performance across various dimensions, regularly updating a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round. Empowered by an easy-to-use interface that requires just a few mouse clicks and a model API, users can conduct a thorough evaluation with minimal coding. Large-scale experiments featuring 23 Chinese LLMs have validated CLEVA’s efficacy.
Search
Co-authors
- Yanyang Li 1
- Jianqiao Zhao 1
- Duo Zheng 1
- Zi-Yuan Hu 1
- Zhi Chen 1
- show all...