How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, Joelle Pineau


Anthology ID:
D16-1230
Volume:
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2016
Address:
Austin, Texas
Editors:
Jian Su, Kevin Duh, Xavier Carreras
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2122–2132
Language:
URL:
https://aclanthology.org/D16-1230
DOI:
10.18653/v1/D16-1230
Bibkey:
Cite (ACL):
Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132, Austin, Texas. Association for Computational Linguistics.
Cite (Informal):
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation (Liu et al., EMNLP 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/D16-1230.pdf
Attachment:
 D16-1230.Attachment.zip
Video:
 https://preview.aclanthology.org/nschneid-patch-2/D16-1230.mp4
Code
 additional community code