Tongshuai.ren Tongshuai.ren
2025
FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC
Jing-Shu Zheng
|
Richeng Xuan
|
Bowen Qin
|
Zheqi He
|
Tongshuai.ren Tongshuai.ren
|
Xuejing Li
|
Jin-Ge Yao
|
Xi Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
We introduce FlagEval-Arena, an evaluation platform for side-by-side comparisons of large language models and text-driven AIGC systems.Compared with the well-known LM Arena (LMSYS Chatbot Arena), we reimplement our own framework with the flexibility to introduce new mechanisms or features. Our platform enables side-by-side evaluation not only for language models or vision-language models, but also text-to-image or text-to-video synthesis. We specifically target at Chinese audience with a more focus on the Chinese language, more models developed by Chinese institutes, and more general usage beyond the technical community. As a result, we currently observe very interesting differences from usual results presented by LM Arena. Our platform is available via this URL: https://flageval.baai.org/#/arena.
Search
Fix author
Co-authors
- Zheqi He 1
- Xuejing Li 1
- Bowen Qin 1
- Richeng Xuan 1
- Xi Yang 1
- show all...
Venues
- acl1