Value Compass Benchmarks: A Comprehensive, Generative and Self-Evolving Platform for LLMs’ Value Evaluation

Jing Yao; Xiaoyuan Yi; Shitong Duan; Jindong Wang; Yuzhuo Bai; Muhua Huang; Yang Ou; Scarlett Li; Peng Zhang; Tun Lu; Zhicheng Dou (窦志成); Maosong Sun (孙茂松); James Evans; Xing Xie

Value Compass Benchmarks: A Comprehensive, Generative and Self-Evolving Platform for LLMs’ Value Evaluation

Jing Yao, Xiaoyuan Yi, Shitong Duan, Jindong Wang, Yuzhuo Bai, Muhua Huang, Yang Ou, Scarlett Li, Peng Zhang, Tun Lu, Zhicheng Dou, Maosong Sun, James Evans, Xing Xie

Abstract

As large language models (LLMs) are gradually integrated into human daily life, assessing their underlying values becomes essential for understanding their risks and alignment with specific preferences. Despite growing efforts, current value evaluation methods face two key challenges. C1. Evaluation Validity: Static benchmarks fail to reflect intended values or yield informative results due to data contamination or a ceiling effect. C2. Result Interpretation: They typically reduce the pluralistic and often incommensurable values to one-dimensional scores, which hinders users from gaining meaningful insights and guidance. To address these challenges, we present Value Compass Benchmarks, the first dynamic, online and interactive platform specially devised for comprehensive value diagnosis of LLMs. It (1) grounds evaluations in multiple basic value systems from social science; (2) develops a generative evolving evaluation paradigm that automatically creates real-world test items co-evolving with ever-advancing LLMs; (3) offers multi-faceted result interpretation, including (i) fine-grained scores and case studies across 27 value dimensions for 33 leading LLMs, (ii) customized comparisons, and (iii) visualized analysis of LLMs’ alignment with cultural values. We hope Value Compass Benchmarks serves as a navigator for further enhancing LLMs’ safety and alignment, benefiting their responsible and adaptive development.

Anthology ID:: 2025.acl-demo.64
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Pushkar Mishra, Smaranda Muresan, Tao Yu
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 666–678
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.64/
DOI:
Bibkey:
Cite (ACL):: Jing Yao, Xiaoyuan Yi, Shitong Duan, Jindong Wang, Yuzhuo Bai, Muhua Huang, Yang Ou, Scarlett Li, Peng Zhang, Tun Lu, Zhicheng Dou, Maosong Sun, James Evans, and Xing Xie. 2025. Value Compass Benchmarks: A Comprehensive, Generative and Self-Evolving Platform for LLMs’ Value Evaluation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 666–678, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Value Compass Benchmarks: A Comprehensive, Generative and Self-Evolving Platform for LLMs’ Value Evaluation (Yao et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.64.pdf