FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC
Jing-Shu Zheng, Richeng Xuan, Bowen Qin, Zheqi He, Tongshuai.ren Tongshuai.ren, Xuejing Li, Jin-Ge Yao, Xi Yang
Abstract
We introduce FlagEval-Arena, an evaluation platform for side-by-side comparisons of large language models and text-driven AIGC systems.Compared with the well-known LM Arena (LMSYS Chatbot Arena), we reimplement our own framework with the flexibility to introduce new mechanisms or features. Our platform enables side-by-side evaluation not only for language models or vision-language models, but also text-to-image or text-to-video synthesis. We specifically target at Chinese audience with a more focus on the Chinese language, more models developed by Chinese institutes, and more general usage beyond the technical community. As a result, we currently observe very interesting differences from usual results presented by LM Arena. Our platform is available via this URL: https://flageval.baai.org/#/arena.- Anthology ID:
- 2025.acl-demo.56
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Pushkar Mishra, Smaranda Muresan, Tao Yu
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 583–591
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.56/
- DOI:
- Cite (ACL):
- Jing-Shu Zheng, Richeng Xuan, Bowen Qin, Zheqi He, Tongshuai.ren Tongshuai.ren, Xuejing Li, Jin-Ge Yao, and Xi Yang. 2025. FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 583–591, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC (Zheng et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.56.pdf