Towards Comprehensive Evaluation of Open-Source Language Models: A Multi-Dimensional, User-Driven Approach

Qingchen Yu

Towards Comprehensive Evaluation of Open-Source Language Models: A Multi-Dimensional, User-Driven Approach

Abstract

With rapid advancements in large language models (LLMs) across artificial intelligence, machine learning, and data sci-ence, there is a growing need for evaluation frameworks that go beyond traditional performance metrics. Conventional methods focus mainly on accuracy and computational metrics, often neglecting user experience and community interaction—key elements in open-source environments. This paper intro-duces a multi-dimensional, user-centered evaluation frame-work, integrating metrics like User Engagement Index (UEI), Community Response Rate (CRR), and a Time Weight Factor (TWF) to assess LLMs’ real-world impact. Additionally, we propose an adaptive weighting mechanism using Bayesian op-timization to dynamically adjust metric weights for more ac-curate model evaluation. Experimental results confirm that our framework effectively identifies models with strong user engagement and community support, offering a balanced, data-driven approach to open-source LLM evaluation. This frame-work serves as a valuable tool for developers and researchers in selecting and improving open-source models.

Anthology ID:: 2025.gem-1.1
Volume:: Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:: July
Year:: 2025
Address:: Vienna, Austria and virtual meeting
Editors:: Ofir Arviv, Miruna Clinciu, Kaustubh Dhole, Rotem Dror, Sebastian Gehrmann, Eliya Habba, Itay Itzhak, Simon Mille, Yotam Perlitz, Enrico Santus, João Sedoc, Michal Shmueli Scheuer, Gabriel Stanovsky, Oyvind Tafjord
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–7
Language:
URL:: https://preview.aclanthology.org/nschneid-patch-1/2025.gem-1.1/
DOI:
Bibkey:
Cite (ACL):: Qingchen Yu. 2025. Towards Comprehensive Evaluation of Open-Source Language Models: A Multi-Dimensional, User-Driven Approach. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 1–7, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Towards Comprehensive Evaluation of Open-Source Language Models: A Multi-Dimensional, User-Driven Approach (Yu, GEM 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-1/2025.gem-1.1.pdf

PDF Cite Search Fix data