Shom Lin
2025
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Haonan Li | Xudong Han | Zenan Zhai | Honglin Mu | Hao Wang | Zhenxuan Zhang | Yilin Geng | Shom Lin | Renxi Wang | Artem Shelmanov | Xiangyu Qi | Yuxia Wang | Donghai Hong | Youliang Yuan | Meng Chen | Haoqin Tu | Fajri Koto | Cong Zeng | Tatsuki Kuribayashi | Rishabh Bhardwaj | Bingchen Zhao | Yawen Duan | Yi Liu | Emad A. Alghamdi | Yaodong Yang | Yinpeng Dong | Soujanya Poria | Pengfei Liu | Zhengzhong Liu | Hector Xuguang Ren | Eduard Hovy | Iryna Gurevych | Preslav Nakov | Monojit Choudhury | Timothy Baldwin
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Haonan Li | Xudong Han | Zenan Zhai | Honglin Mu | Hao Wang | Zhenxuan Zhang | Yilin Geng | Shom Lin | Renxi Wang | Artem Shelmanov | Xiangyu Qi | Yuxia Wang | Donghai Hong | Youliang Yuan | Meng Chen | Haoqin Tu | Fajri Koto | Cong Zeng | Tatsuki Kuribayashi | Rishabh Bhardwaj | Bingchen Zhao | Yawen Duan | Yi Liu | Emad A. Alghamdi | Yaodong Yang | Yinpeng Dong | Soujanya Poria | Pengfei Liu | Zhengzhong Liu | Hector Xuguang Ren | Eduard Hovy | Iryna Gurevych | Preslav Nakov | Monojit Choudhury | Timothy Baldwin
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
As large language models (LLMs) continue to evolve, leaderboards play a significant role in steering their development. Existing leaderboards often prioritize model capabilities while overlooking safety concerns, leaving a significant gap in responsible AI development. To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.
2024
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
Yuxia Wang | Zenan Zhai | Haonan Li | Xudong Han | Shom Lin | Zhenxuan Zhang | Angela Zhao | Preslav Nakov | Timothy Baldwin
Findings of the Association for Computational Linguistics: ACL 2024
Yuxia Wang | Zenan Zhai | Haonan Li | Xudong Han | Shom Lin | Zhenxuan Zhang | Angela Zhao | Preslav Nakov | Timothy Baldwin
Findings of the Association for Computational Linguistics: ACL 2024
Many studies have demonstrated that large language models (LLMs) can produce harmful responses, exposing users to unexpected risks. Previous studies have proposed comprehensive taxonomies of LLM risks, as well as corresponding prompts that can be used to examine LLM safety. However, the focus has been almost exclusively on English. We aim to broaden LLM safety research by introducing a dataset for the safety evaluation of Chinese LLMs, and extending it to better identify false negative and false positive examples in terms of risky prompt rejections. We further present a set of fine-grained safety assessment criteria for each risk type, facilitating both manual annotation and automatic evaluation in terms of LLM response harmfulness. Our experiments over five LLMs show that region-specific risks are the prevalent risk type. Warning: this paper contains example data that may be offensive, harmful, or biased. Our data is available at https://github.com/Libr-AI/do-not-answer.
Search
Fix author
Co-authors
- Timothy Baldwin 2
- Xudong Han 2
- Haonan Li 2
- Preslav Nakov 2
- Yuxia Wang 2
- Zenan Zhai 2
- Zhenxuan Zhang 2
- Emad A. Alghamdi 1
- Rishabh Bhardwaj 1
- Meng Chen 1
- Monojit Choudhury 1
- Yinpeng Dong 1
- Yawen Duan 1
- Yilin Geng 1
- Iryna Gurevych 1
- Donghai Hong 1
- Eduard Hovy 1
- Fajri Koto 1
- Tatsuki Kuribayashi 1
- Yi Liu 1
- Pengfei Liu 1
- Zhengzhong Liu 1
- Honglin Mu 1
- Soujanya Poria 1
- Xiangyu Qi 1
- Hector Xuguang Ren 1
- Artem Shelmanov 1
- Haoqin Tu 1
- Hao Wang 1
- Renxi Wang 1
- Yaodong Yang (杨耀东) 1
- Youliang Yuan 1
- Cong Zeng 1
- Angela Zhao 1
- Bingchen Zhao 1