TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models

Yanbo Wang, Jiayi Ye, Siyuan Wu, Chujie Gao, Yue Huang, Xiuying Chen, Yue Zhao, Xiangliang Zhang


Abstract
Ensuring the trustworthiness of Generative Foundation Models (GenFMs) is a pressing challenge as they gain widespread use. Existing evaluation toolkits are often limited in scope, dynamism, and flexibility. This paper introduces TRUSTEVAL, a dynamic and comprehensive toolkit designed for evaluating GenFMs across various dimensions. TRUSTEVAL supports both dynamic dataset generation and evaluation, offering advanced features including comprehensiveness, usability, and flexibility. TRUSTEVAL integrates diverse generative models, datasets, evaluation methods, metrics, inference efficiency enhancement, and evaluation report generation. Through case studies, we demonstrate TRUSTEVAL’s potential to advance the trustworthiness evaluation of GenFMs.
Anthology ID:
2025.naacl-demo.8
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Nouha Dziri, Sean (Xiang) Ren, Shizhe Diao
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
70–84
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-demo.8/
DOI:
Bibkey:
Cite (ACL):
Yanbo Wang, Jiayi Ye, Siyuan Wu, Chujie Gao, Yue Huang, Xiuying Chen, Yue Zhao, and Xiangliang Zhang. 2025. TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), pages 70–84, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models (Wang et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-demo.8.pdf