FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC

Jing-Shu Zheng, Richeng Xuan, Bowen Qin, Zheqi He, Tongshuai.ren Tongshuai.ren, Xuejing Li, Jin-Ge Yao, Xi Yang


Abstract
We introduce FlagEval-Arena, an evaluation platform for side-by-side comparisons of large language models and text-driven AIGC systems.Compared with the well-known LM Arena (LMSYS Chatbot Arena), we reimplement our own framework with the flexibility to introduce new mechanisms or features. Our platform enables side-by-side evaluation not only for language models or vision-language models, but also text-to-image or text-to-video synthesis. We specifically target at Chinese audience with a more focus on the Chinese language, more models developed by Chinese institutes, and more general usage beyond the technical community. As a result, we currently observe very interesting differences from usual results presented by LM Arena. Our platform is available via this URL: https://flageval.baai.org/#/arena.
Anthology ID:
2025.acl-demo.56
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Pushkar Mishra, Smaranda Muresan, Tao Yu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
583–591
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.56/
DOI:
Bibkey:
Cite (ACL):
Jing-Shu Zheng, Richeng Xuan, Bowen Qin, Zheqi He, Tongshuai.ren Tongshuai.ren, Xuejing Li, Jin-Ge Yao, and Xi Yang. 2025. FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 583–591, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC (Zheng et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.56.pdf
Copyright agreement:
 2025.acl-demo.56.copyright_agreement.pdf