Fusion-Eval: Integrating Assistant Evaluators with LLMs

Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, Lei Meng


Abstract
Evaluating natural language generation (NLG) systems automatically poses significant challenges.Recent studies have employed large language models (LLMs) as reference-free metrics for NLG evaluation, enhancing adaptability to new tasks tasks. However, these methods still show lower correspondence with human judgments compared to specialized neural evaluators.In this paper, we introduce “Fusion-Eval”, an innovative approach that leverages LLMs to integrate insights from various assistant evaluators. The LLM is given the example to evaluate along with scores from the assistant evaluators. Each of these evaluators specializes in assessing distinct aspects of responses.Fusion-Eval achieves a 0.962 system-level Kendall-Tau correlation with humans on SummEval and a 0.744 turn-level Spearman correlation on TopicalChat, which is significantly higher than baseline methods. These results highlight Fusion-Eval’s significant potential in the realm of natural language system evaluation.
Anthology ID:
2024.emnlp-industry.18
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
225–238
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.emnlp-industry.18/
DOI:
10.18653/v1/2024.emnlp-industry.18
Bibkey:
Cite (ACL):
Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, and Lei Meng. 2024. Fusion-Eval: Integrating Assistant Evaluators with LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 225–238, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
Fusion-Eval: Integrating Assistant Evaluators with LLMs (Shu et al., EMNLP 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.emnlp-industry.18.pdf
Poster:
 2024.emnlp-industry.18.poster.pdf