CLEVA: Chinese Language Models EVAluation Platform
Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael Lyu, Liwei Wang
Abstract
With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model’s capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model’s performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted to holistically evaluate Chinese LLMs. Our platform employs a standardized workflow to assess LLMs’ performance across various dimensions, regularly updating a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round. Empowered by an easy-to-use interface that requires just a few mouse clicks and a model API, users can conduct a thorough evaluation with minimal coding. Large-scale experiments featuring 23 Chinese LLMs have validated CLEVA’s efficacy.- Anthology ID:
- 2023.emnlp-demo.17
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Yansong Feng, Els Lefever
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 186–217
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/2023.emnlp-demo.17/
- DOI:
- 10.18653/v1/2023.emnlp-demo.17
- Cite (ACL):
- Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael Lyu, and Liwei Wang. 2023. CLEVA: Chinese Language Models EVAluation Platform. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 186–217, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- CLEVA: Chinese Language Models EVAluation Platform (Li et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/2023.emnlp-demo.17.pdf