MEEP: Is this Engaging? Prompting Large Language Models for Dialogue Evaluation in Multilingual Settings

Amila Ferron, Amber Shore, Ekata Mitra, Ameeta Agrawal


Abstract
As dialogue systems become more popular, evaluation of their response quality gains importance. Engagingness highly correlates with overall quality and creates a sense of connection that gives human participants a more fulfilling experience. Although qualities like coherence and fluency are readily measured with well-worn automatic metrics, evaluating engagingness often relies on human assessment, which is a costly and time-consuming process. Existing automatic engagingness metrics evaluate the response without the conversation history, are designed for one dataset, or have limited correlation with human annotations. Furthermore, they have been tested exclusively on English conversations. Given that dialogue systems are increasingly available in languages beyond English, multilingual evaluation capabilities are essential. We propose that large language models (LLMs) may be used for evaluation of engagingness in dialogue through prompting, and ask how prompt constructs and translated prompts compare in a multilingual setting. We provide a prompt-design taxonomy for engagingness and find that using selected prompt elements with LLMs, including our comprehensive definition of engagingness, outperforms state-of-the-art methods on evaluation of engagingness in dialogue across multiple languages.
Anthology ID:
2023.findings-emnlp.137
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2078–2100
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.137
DOI:
10.18653/v1/2023.findings-emnlp.137
Bibkey:
Cite (ACL):
Amila Ferron, Amber Shore, Ekata Mitra, and Ameeta Agrawal. 2023. MEEP: Is this Engaging? Prompting Large Language Models for Dialogue Evaluation in Multilingual Settings. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2078–2100, Singapore. Association for Computational Linguistics.
Cite (Informal):
MEEP: Is this Engaging? Prompting Large Language Models for Dialogue Evaluation in Multilingual Settings (Ferron et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/improve-issue-templates/2023.findings-emnlp.137.pdf