Abstract
We consider the intrinsic evaluation of neural generative dialog models through the lens of Grice’s Maxims of Conversation (1975). Based on the maxim of Quantity (be informative), we propose Relative Utterance Quantity (RUQ) to diagnose the ‘I don’t know’ problem, in which a dialog system produces generic responses. The linguistically motivated RUQ diagnostic compares the model score of a generic response to that of the reference response. We find that for reasonable baseline models, ‘I don’t know’ is preferred over the reference the majority of the time, but this can be reduced to less than 5% with hyperparameter tuning. RUQ allows for the direct analysis of the ‘I don’t know’ problem, which has been addressed but not analyzed by prior work.- Anthology ID:
- 2021.naacl-main.450
- Volume:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5659–5670
- Language:
- URL:
- https://aclanthology.org/2021.naacl-main.450
- DOI:
- 10.18653/v1/2021.naacl-main.450
- Cite (ACL):
- Huda Khayrallah and João Sedoc. 2021. Measuring the ‘I don’t know’ Problem through the Lens of Gricean Quantity. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5659–5670, Online. Association for Computational Linguistics.
- Cite (Informal):
- Measuring the ‘I don’t know’ Problem through the Lens of Gricean Quantity (Khayrallah & Sedoc, NAACL 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2021.naacl-main.450.pdf
- Data
- DailyDialog, FLoRes