Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Haonan Li; Xudong Han; Zenan Zhai; Honglin Mu; Hao Wang (汪浩); Zhenxuan Zhang; Yilin Geng; Shom Lin; Renxi Wang; Artem Shelmanov; Xiangyu Qi; Yuxia Wang; Donghai Hong; Youliang Yuan; Meng Chen; Haoqin Tu; Fajri Koto; Cong Zeng; Tatsuki Kuribayashi; Rishabh Bhardwaj; Bingchen Zhao; Yawen Duan; Yi Liu; Emad A. Alghamdi; Yaodong Yang; Yinpeng Dong; Soujanya Poria; Pengfei Liu; Zhengzhong Liu; Hector Xuguang Ren; Eduard Hovy; Iryna Gurevych; Preslav Nakov; Monojit Choudhury; Timothy Baldwin

doi:10.18653/v1/2025.naacl-demo.23

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Haonan Li, Xudong Han, Zenan Zhai, Honglin Mu, Hao Wang, Zhenxuan Zhang, Yilin Geng, Shom Lin, Renxi Wang, Artem Shelmanov, Xiangyu Qi, Yuxia Wang, Donghai Hong, Youliang Yuan, Meng Chen, Haoqin Tu, Fajri Koto, Cong Zeng, Tatsuki Kuribayashi, Rishabh Bhardwaj, Bingchen Zhao, Yawen Duan, Yi Liu, Emad A. Alghamdi, Yaodong Yang, Yinpeng Dong, Soujanya Poria, Pengfei Liu, Zhengzhong Liu, Hector Xuguang Ren, Eduard Hovy, Iryna Gurevych, Preslav Nakov, Monojit Choudhury, Timothy Baldwin

Abstract

As large language models (LLMs) continue to evolve, leaderboards play a significant role in steering their development. Existing leaderboards often prioritize model capabilities while overlooking safety concerns, leaving a significant gap in responsible AI development. To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.

Anthology ID:: 2025.naacl-demo.23
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Nouha Dziri, Sean (Xiang) Ren, Shizhe Diao
Venues:: NAACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 268–286
Language:
URL:: https://preview.aclanthology.org/corrections-2025-06/2025.naacl-demo.23/
DOI:: 10.18653/v1/2025.naacl-demo.23
Bibkey:
Cite (ACL):: Haonan Li, Xudong Han, Zenan Zhai, Honglin Mu, Hao Wang, Zhenxuan Zhang, Yilin Geng, Shom Lin, Renxi Wang, Artem Shelmanov, Xiangyu Qi, Yuxia Wang, Donghai Hong, Youliang Yuan, Meng Chen, Haoqin Tu, Fajri Koto, Cong Zeng, Tatsuki Kuribayashi, Rishabh Bhardwaj, Bingchen Zhao, Yawen Duan, Yi Liu, Emad A. Alghamdi, Yaodong Yang, Yinpeng Dong, Soujanya Poria, Pengfei Liu, Zhengzhong Liu, Hector Xuguang Ren, Eduard Hovy, Iryna Gurevych, Preslav Nakov, Monojit Choudhury, and Timothy Baldwin. 2025. Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), pages 268–286, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability (Li et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-06/2025.naacl-demo.23.pdf

PDF Cite Search Fix data