How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, Joelle Pineau
- Anthology ID:
- D16-1230
- Volume:
- Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2016
- Address:
- Austin, Texas
- Editors:
- Jian Su, Kevin Duh, Xavier Carreras
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2122–2132
- Language:
- URL:
- https://aclanthology.org/D16-1230
- DOI:
- 10.18653/v1/D16-1230
- Cite (ACL):
- Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132, Austin, Texas. Association for Computational Linguistics.
- Cite (Informal):
- How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation (Liu et al., EMNLP 2016)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/D16-1230.pdf
- Code
- additional community code