Xinwei Shi
2025
Semantic-Eval : A Semantic Comprehension Evaluation Framework for Large Language Models Generation without Training
Shusheng Li
|
Jiale Li
|
Yifei Qu
|
Xinwei Shi
|
Yanliang Guo
|
Ziyi He
|
Yubo Wang
|
Wenjun Tan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the increasing prominence of large language models (LLMs), evaluating their text-generation capabilities has become an essential research challenge. Although LLM-based evaluation methods exhibit robust performance, the inherent stochastic nature of the LLM generation process introduces a degree of uncertainty in alignment with human preferences. To address this limitation, we propose Semantic-Eval, the first training-free framework designed to assess LLM-generated text based on semantic understanding. This framework computes semantic similarity between pairwise texts to evaluate the interdependence of semantic units, integrating a graph-based weighting mechanism to account for the differential contributions of individual sentences. A pre-trained natural language inference (NLI) model is also incorporated to mitigate potential semantic relationship biases. We evaluate Semantic-Eval across eight datasets that encompass four common NLP tasks. The experimental results indicate that Semantic-Eval surpasses traditional N-gram and BERT-based evaluation metrics, aligning more closely with human judgments and demonstrating a higher correlation than smaller LLMs. However, it slightly lags behind GPT-4. Finally, we demonstrate the effectiveness of Semantic-Eval in evaluating the generation quality of 13 large language models. The code is publicly available at https://github.com/LssTry/Semantic-Eval.
Search
Fix author
Co-authors
- Yanliang Guo 1
- Ziyi He 1
- Shusheng Li 1
- Jiale Li 1
- Yifei Qu 1
- show all...
Venues
- acl1