SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation

Yufei Tian, Jiao Sun, Nanyun Peng, Zizhao Zhang


Abstract
As language models evolve to tackle complex, multifaceted tasks, their evaluation must adapt to capture this intricacy. A granular, skill-specific understanding of model capabilities can empower researchers to make informed model development plans. In this paper, we introduce SkillVerse, an unsupervised tree-structured diagnosis framework for understanding model proficiency in specific abilities. With LLM as a judge, SkillVerse first critiques the model responses, and then organizes them into a hierarchical structure termed dendrogram. Given proficiency at arbitrary levels of granularity, SkillVerse is flexible to produce insights of behaviors of modern large models. We also demonstrate its efficacy in two downstream tasks: 1) improving model in-context learning by 25% using a tree-search algorithm to select more informative few-shot demonstrations, and 2) accurately predicting new model weaknesses with a 55% success rate, 22% higher than without SkillVerse.
Anthology ID:
2025.acl-long.437
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8917–8933
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.437/
DOI:
Bibkey:
Cite (ACL):
Yufei Tian, Jiao Sun, Nanyun Peng, and Zizhao Zhang. 2025. SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8917–8933, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation (Tian et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.437.pdf