uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems

Yuma Tsuta; Naoki Yoshinaga; Masashi Toyoda

doi:10.18653/v1/2020.acl-srw.27

uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems

Yuma Tsuta, Naoki Yoshinaga, Masashi Toyoda

Abstract

Because open-domain dialogues allow diverse responses, basic reference-based metrics such as BLEU do not work well unless we prepare a massive reference set of high-quality responses for input utterances. To reduce this burden, a human-aided, uncertainty-aware metric, ΔBLEU, has been proposed; it embeds human judgment on the quality of reference outputs into the computation of multiple-reference BLEU. In this study, we instead propose a fully automatic, uncertainty-aware evaluation method for open-domain dialogue systems, υBLEU. This method first collects diverse reference responses from massive dialogue data and then annotates their quality judgments by using a neural network trained on automatically collected training data. Experimental results on massive Twitter data confirmed that υBLEU is comparable to ΔBLEU in terms of its correlation with human judgment and that the state of the art automatic evaluation method, RUBER, is improved by integrating υBLEU.

Anthology ID:: 2020.acl-srw.27
Volume:: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:: July
Year:: 2020
Address:: Online
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 199–206
Language:
URL:: https://aclanthology.org/2020.acl-srw.27
DOI:: 10.18653/v1/2020.acl-srw.27
Bibkey:
Cite (ACL):: Yuma Tsuta, Naoki Yoshinaga, and Masashi Toyoda. 2020. uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 199–206, Online. Association for Computational Linguistics.
Cite (Informal):: uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems (Tsuta et al., ACL 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2020.acl-srw.27.pdf
Video:: http://slideslive.com/38928654

PDF Cite Search Video