What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation

Sarik Ghazarian; Behnam Hedayatnia; Alexandros Papangelis; Yang Liu; Dilek Hakkani-Tur

doi:10.18653/v1/2022.findings-acl.331

What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation

Sarik Ghazarian, Behnam Hedayatnia, Alexandros Papangelis, Yang Liu, Dilek Hakkani-Tur

Abstract

Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs with weak supervision, without requiring manual system turn quality annotations. Experiments show that our model is comparable to models trained on human annotated data. Furthermore, our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.

Anthology ID:: 2022.findings-acl.331
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4194–4204
Language:
URL:: https://aclanthology.org/2022.findings-acl.331
DOI:: 10.18653/v1/2022.findings-acl.331
Bibkey:
Cite (ACL):: Sarik Ghazarian, Behnam Hedayatnia, Alexandros Papangelis, Yang Liu, and Dilek Hakkani-Tur. 2022. What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 4194–4204, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation (Ghazarian et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2022.findings-acl.331.pdf
Video:: https://preview.aclanthology.org/landing_page/2022.findings-acl.331.mp4
Code: alexa/conture
Data: FED

PDF Search Code Video