RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue

Zhengliang Shi; Weiwei Sun; Shuo Zhang; Zhen Zhang; Pengjie Ren; Zhaochun Ren

RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue

Zhengliang Shi, Weiwei Sun, Shuo Zhang, Zhen Zhang, Pengjie Ren, Zhaochun Ren

Abstract

Evaluating open-domain dialogue systems is challenging for reasons such as the one-to-many problem, i.e., many appropriate responses other than just the golden response. As of now, automatic evaluation methods need better consistency with humans, while reliable human evaluation can be time- and cost-intensive. To this end, we propose the Reference-Assisted Dialogue Evaluation (RADE) approach under the multi-task learning framework, which leverages the pre-created utterance as reference other than the gold response to relief the one-to-many problem. Specifically, RADE explicitly compares reference and the candidate response to predict their overall scores.Moreover, an auxiliary response generation task enhances prediction via a shared encoder.To support RADE, we extend three datasets with additional rated responses other than just a golden response by human annotation.Experiments on our three datasets and two existing benchmarks demonstrate the effectiveness of our method, where Pearson, Spearman, and Kendall correlations with human evaluation outperform state-of-the-art baselines.

Anthology ID:: 2023.acl-long.719
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12856–12875
Language:
URL:: https://aclanthology.org/2023.acl-long.719
DOI:
Bibkey:
Cite (ACL):: Zhengliang Shi, Weiwei Sun, Shuo Zhang, Zhen Zhang, Pengjie Ren, and Zhaochun Ren. 2023. RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12856–12875, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue (Shi et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/paclic-22-ingestion/2023.acl-long.719.pdf

PDF Search