Automatic Evaluate Dialogue Appropriateness by Using Dialogue Act

Bao Chen, Yuanjie Wang, Zeming Liu, Yuhang Guo


Abstract
Evaluation of dialogue systems requires assessing various aspects, among which appropriateness holds significance as a core element of communicative language competence. However, current evaluations heavily rely on human judgments, which are time-consuming, labor-intensive, prone to biases, and lacking objectivity. In this paper, we introduce Dialogue Act Appropriateness (DAA), a novel method that utilizes the underlying patterns of dialogue act transitions to evaluate the appropriateness of chatbot responses. We learn transition patterns from human-human dialogue corpora, evaluating chatbot appropriateness by measuring the similarity of their transition patterns to those observed in human-human dialogues. To validate DAA, we annotate a test dataset by manually evaluating the appropriateness of dialogues from multiple chatbot systems. The experimental results demonstrate a strong correlation between our evaluation metric and human ratings, establishing the reliability of DAA as a measure of dialogue appropriateness.
Anthology ID:
2023.findings-emnlp.492
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7361–7372
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.492
DOI:
10.18653/v1/2023.findings-emnlp.492
Bibkey:
Cite (ACL):
Bao Chen, Yuanjie Wang, Zeming Liu, and Yuhang Guo. 2023. Automatic Evaluate Dialogue Appropriateness by Using Dialogue Act. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7361–7372, Singapore. Association for Computational Linguistics.
Cite (Informal):
Automatic Evaluate Dialogue Appropriateness by Using Dialogue Act (Chen et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2023.findings-emnlp.492.pdf